The Visual Computer

, Volume 30, Issue 10, pp 1107–1122 | Cite as

XKin: an open source framework for hand pose and gesture recognition using kinect

  • Fabrizio PedersoliEmail author
  • Sergio Benini
  • Nicola Adami
  • Riccardo Leonardi
Original Article


This work targets real-time recognition of both static hand-poses and dynamic hand-gestures in a unified open-source framework. The developed solution enables natural and intuitive hand-pose recognition of American Sign Language (ASL), extending the recognition to ambiguous letters not challenged by previous work. While hand-pose recognition exploits techniques working on depth information using texture-based descriptors, gesture recognition evaluates hand trajectories in the depth stream using angular features and hidden Markov models (HMM). Although classifiers come already trained on ASL alphabet and 16 uni-stroke dynamic gestures, users are able to extend these default sets by adding their personalized poses and gestures. The accuracy and robustness of the recognition system have been evaluated using a publicly available database and across many users. The XKin open project is available online (Pedersoli, XKin libraries., 2013) under FreeBSD License for researchers in human–machine interaction.


Kinect Hand pose Gesture recognition Open-source XKin Human computer interaction 


  1. 1.
    3Gear systems: Gestural user interfaces. (2013)
  2. 2.
    American sign language. (2013)
  3. 3.
    Gibson Hasbrouck & Associates. (2013)
  4. 4.
    Biswas, K., Basu, S.: Gesture recognition using Microsoft Kinect\(^{\rm TM}\). In: 2011 5th International Conference on Automation, Robotics and Applications (ICARA), pp. 100–103 (2011). doi: 10.1109/ICARA.2011.6144864
  5. 5.
    Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002). doi: 10.1109/34.1000236 CrossRefGoogle Scholar
  6. 6.
    Daugman, J.G.: Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 2(7), 1160–1169 (1985)CrossRefGoogle Scholar
  7. 7.
    Doliotis, P., Athitsos, V., Kosmopoulos, D.I., Perantonis, S.J.: Hand shape and 3d pose estimation using depth data from a single cluttered frame. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Fowlkes, C., Wang, S., Choi, M.H., Mantler, S., Schulze, J.P., Acevedo, D., Mueller, K., Papka, M.E. (eds.) International Symposium on Visual Computing (ISVC). Springer, Springer (2012)Google Scholar
  8. 8.
    Doliotis, P., Stefan, A., McMurrough, C., Eckhard, D., Athitsos, V.: Comparing gesture recognition accuracy using color and depth information. In: Proceedings of the 4th International Conference on Pervasive Technologies Related to Assistive Environments, PETRA ’11, pp. 20:1–20:7. ACM (2011). doi: 10.1145/2141622.2141647
  9. 9.
    Escalera, S., Gonzlez, J., Bar, X., Reyes, M., Lopes, O., Guyon, I., Athitsos, V., Escalante, H.: Multi-modal gesture recognition challenge 2013: Dataset and results. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, p. 445452 (2013).
  10. 10.
    Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Escalante, H.: Chalearn gesture challenge: Design and first results. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6 (2012) doi: 10.1109/CVPRW.2012.6239178
  11. 11.
    Keskin, C., Kirac, F., Kara, Y., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: ECCV12, pp. VI: 852–863 (2012)Google Scholar
  12. 12.
    Le, T.L., Nguyen, V.N., Tran, T.T.H., Nguyen, V.T., Nguyen, T.T.: Temporal gesture segmentation for recognition. In: 2013 International Conference on Computing, Management and Telecommunications (ComManTel), pp. 369–373 (2013). doi: 10.1109/ComManTel.6482422
  13. 13.
    Li, Y.: Hand gesture recognition using Kinect. In: 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS), pp. 196–199 (2012). doi: 10.1109/ICSESS.2012.6269439
  14. 14.
    Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. Vis. Comput. 29(6–8), 837–848 (2013). doi: 10.1007/s00371-013-0822-4
  15. 15.
    Liddel, S., Johnson, R.E.: American sign language—compound formation processes, lexicalization, and phonological remnants. Nat. Lang. Ling. Theory 4, 445–513 (1986)Google Scholar
  16. 16.
    Microsoft Kinect for Windows. (2013)
  17. 17.
    Mihail, R.P., Jacobs, N., Goldsmith, J.: Real time gesture recognition with 2 Kinect sensors. In: International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV) (2012)Google Scholar
  18. 18.
    Myers, C.S., Rabiner, L.R.: Comparative Study of several dynamic time-warping algorithms for connected-word recognition. Bell Syst. Tech. J. 60(7) (1981)Google Scholar
  19. 19.
    Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using Kinect. In: British Machine Vision Conference, pp. 101.1-101.11. British Machine Vision Association (2011). doi: 10.5244/C.25.101
  20. 20.
    Libfreenect. (2013)
  21. 21.
    Standard framework for 3D sensing. (2013)
  22. 22.
    Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Pedersoli, F.: XKin libraries. (2013)
  24. 24.
    Pedersoli, F., Adami, N., Benini, S., Leonardi, R.: XKin - eXtendable hand pose and gesture recognition library for Kinect. In: Proceedings of ACM Conference on Multimedia 2012—Open Source Competition, Nara (2012)Google Scholar
  25. 25.
    Peris, M., Fukui, K.: Both-hand gesture recognition based on komsm with volume subspaces for robot teleoperation. In: IEEE-Cyber (2012)Google Scholar
  26. 26.
    PrimeSense: NiTE. (2013)
  27. 27.
    PrimeSense: sensing and natural interaction. (2013)
  28. 28.
    Pugeault, N., Bowden, R.: Spelling it out: real-time asl fingerspelling recognition. IEEE International Conference on Computer Vision Workshops, ICCV, vol. 2011, pp. 1114–1119 (2011)Google Scholar
  29. 29.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2), 257–286 (1989). doi: 10.1109/5.18626
  30. 30.
    Ren, Z., Meng, J., Yuan, J., Zhang, Z.: Robust hand gesture recognition with Kinect sensor. In: Proceedings of the 19th ACM international conference on Multimedia, MM ’11, pp. 759–760. ACM (2011). doi: 10.1145/2072298.2072443
  31. 31.
    Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition using kinect sensor. IEEE Trans. Multimedia 15(5), 1110–1120 (2013). doi: 10.1109/TMM.2013.2246148 CrossRefGoogle Scholar
  32. 32.
    Robot Operating System. (2013)
  33. 33.
    Rubine, D.: Specifying gestures by example. SIGGRAPH Comput. Graph. 25(4), 329–337 (1991). doi: 10.1145/127719.122753
  34. 34.
    \({\$}\)1 Unistroke Recognizer. (2013)
  35. 35.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11, pp. 1297–1304. IEEE Computer Society, Washington, DC, USA (2011). doi: 10.1109/CVPR.2011.5995316
  36. 36.
    Uebersax, D., Gall, J., den Bergh, M.V., Gool, L.J.V.: Real-time sign language letter and word recognition from depth data. IEEE International Conference on Computer Vision Workshops, ICCV, vol. 2011, pp. 383–390 (2011)Google Scholar
  37. 37.
    Wachs, J.P., Kölsch, M., Stern, H., Edan, Y.: Vision-based hand-gesture applications. Commun. ACM 54(2), 60–71 (2011). doi: 10.1145/1897816.1897838 Google Scholar
  38. 38.
    Wan, T., Wang, Y., Li, J.: Hand gesture recognition system using depth data. In: Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on, pp. 1063–1066 (2012). doi: 10.1109/CECNet.6201837
  39. 39.
    Wang, R., Paris, S., Popović, J.: 6d hands: markerless hand-tracking for computer aided design. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST ’11, pp. 549–558. ACM, New York (2011). doi: 10.1145/2047196.2047269
  40. 40.
    Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a \({\$}\)1 recognizer for user interface prototypes. In: Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology, UIST 07, pp. 159–168. ACM, New York (2007). doi: 10.1145/1294211.1294238
  41. 41.
    Zhang, H.J., Kankanhalli, A., Smoliar, S.: Automatic partitioning of full-motion video. Multimedia Syst. 1(1), 10–28 (1993). doi: 10.1007/BF01210504 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Fabrizio Pedersoli
    • 1
    Email author
  • Sergio Benini
    • 1
  • Nicola Adami
    • 1
  • Riccardo Leonardi
    • 1
  1. 1.Department of Information EngineeringUniversity of BresciaBresciaItaly

Personalised recommendations