Skip to main content
Log in

XKin: an open source framework for hand pose and gesture recognition using kinect

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

This work targets real-time recognition of both static hand-poses and dynamic hand-gestures in a unified open-source framework. The developed solution enables natural and intuitive hand-pose recognition of American Sign Language (ASL), extending the recognition to ambiguous letters not challenged by previous work. While hand-pose recognition exploits techniques working on depth information using texture-based descriptors, gesture recognition evaluates hand trajectories in the depth stream using angular features and hidden Markov models (HMM). Although classifiers come already trained on ASL alphabet and 16 uni-stroke dynamic gestures, users are able to extend these default sets by adding their personalized poses and gestures. The accuracy and robustness of the recognition system have been evaluated using a publicly available database and across many users. The XKin open project is available online (Pedersoli, XKin libraries. https://github.com/fpeder/XKin, 2013) under FreeBSD License for researchers in human–machine interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. 3Gear systems: Gestural user interfaces. http://www.threegear.com/ (2013)

  2. American sign language. http://en.wikipedia.org/wiki/AmericanLanguage (2013)

  3. Gibson Hasbrouck & Associates. http://www.gha-pd.com/ (2013)

  4. Biswas, K., Basu, S.: Gesture recognition using Microsoft Kinect\(^{\rm TM}\). In: 2011 5th International Conference on Automation, Robotics and Applications (ICARA), pp. 100–103 (2011). doi:10.1109/ICARA.2011.6144864

  5. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002). doi:10.1109/34.1000236

    Article  Google Scholar 

  6. Daugman, J.G.: Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 2(7), 1160–1169 (1985)

    Article  Google Scholar 

  7. Doliotis, P., Athitsos, V., Kosmopoulos, D.I., Perantonis, S.J.: Hand shape and 3d pose estimation using depth data from a single cluttered frame. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Fowlkes, C., Wang, S., Choi, M.H., Mantler, S., Schulze, J.P., Acevedo, D., Mueller, K., Papka, M.E. (eds.) International Symposium on Visual Computing (ISVC). Springer, Springer (2012)

    Google Scholar 

  8. Doliotis, P., Stefan, A., McMurrough, C., Eckhard, D., Athitsos, V.: Comparing gesture recognition accuracy using color and depth information. In: Proceedings of the 4th International Conference on Pervasive Technologies Related to Assistive Environments, PETRA ’11, pp. 20:1–20:7. ACM (2011). doi:10.1145/2141622.2141647

  9. Escalera, S., Gonzlez, J., Bar, X., Reyes, M., Lopes, O., Guyon, I., Athitsos, V., Escalante, H.: Multi-modal gesture recognition challenge 2013: Dataset and results. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, p. 445452 (2013). http://dl.acm.org/citation.cfm?id=2532595

  10. Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Escalante, H.: Chalearn gesture challenge: Design and first results. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6 (2012) doi:10.1109/CVPRW.2012.6239178

  11. Keskin, C., Kirac, F., Kara, Y., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: ECCV12, pp. VI: 852–863 (2012)

  12. Le, T.L., Nguyen, V.N., Tran, T.T.H., Nguyen, V.T., Nguyen, T.T.: Temporal gesture segmentation for recognition. In: 2013 International Conference on Computing, Management and Telecommunications (ComManTel), pp. 369–373 (2013). doi:10.1109/ComManTel.6482422

  13. Li, Y.: Hand gesture recognition using Kinect. In: 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS), pp. 196–199 (2012). doi:10.1109/ICSESS.2012.6269439

  14. Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. Vis. Comput. 29(6–8), 837–848 (2013). doi:10.1007/s00371-013-0822-4

  15. Liddel, S., Johnson, R.E.: American sign language—compound formation processes, lexicalization, and phonological remnants. Nat. Lang. Ling. Theory 4, 445–513 (1986)

    Google Scholar 

  16. Microsoft Kinect for Windows. http://www.microsoft.com/en-us/kinectforwindows (2013)

  17. Mihail, R.P., Jacobs, N., Goldsmith, J.: Real time gesture recognition with 2 Kinect sensors. In: International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV) (2012)

  18. Myers, C.S., Rabiner, L.R.: Comparative Study of several dynamic time-warping algorithms for connected-word recognition. Bell Syst. Tech. J. 60(7) (1981)

  19. Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using Kinect. In: British Machine Vision Conference, pp. 101.1-101.11. British Machine Vision Association (2011). doi:10.5244/C.25.101

  20. Libfreenect. http://openkinect.org (2013)

  21. Standard framework for 3D sensing. http://www.openni.org (2013)

  22. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  MathSciNet  Google Scholar 

  23. Pedersoli, F.: XKin libraries. https://github.com/fpeder/XKin (2013)

  24. Pedersoli, F., Adami, N., Benini, S., Leonardi, R.: XKin - eXtendable hand pose and gesture recognition library for Kinect. In: Proceedings of ACM Conference on Multimedia 2012—Open Source Competition, Nara (2012)

  25. Peris, M., Fukui, K.: Both-hand gesture recognition based on komsm with volume subspaces for robot teleoperation. In: IEEE-Cyber (2012)

  26. PrimeSense: NiTE. http://www.primesense.com/nite (2013)

  27. PrimeSense: sensing and natural interaction. http://www.primesense.com (2013)

  28. Pugeault, N., Bowden, R.: Spelling it out: real-time asl fingerspelling recognition. IEEE International Conference on Computer Vision Workshops, ICCV, vol. 2011, pp. 1114–1119 (2011)

  29. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2), 257–286 (1989). doi:10.1109/5.18626

  30. Ren, Z., Meng, J., Yuan, J., Zhang, Z.: Robust hand gesture recognition with Kinect sensor. In: Proceedings of the 19th ACM international conference on Multimedia, MM ’11, pp. 759–760. ACM (2011). doi:10.1145/2072298.2072443

  31. Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition using kinect sensor. IEEE Trans. Multimedia 15(5), 1110–1120 (2013). doi:10.1109/TMM.2013.2246148

    Article  Google Scholar 

  32. Robot Operating System. http://www.ros.org/wiki/ (2013)

  33. Rubine, D.: Specifying gestures by example. SIGGRAPH Comput. Graph. 25(4), 329–337 (1991). doi:10.1145/127719.122753

  34. \({\$}\)1 Unistroke Recognizer. http://depts.washington.edu/aimgroup/proj/dollar/ (2013)

  35. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11, pp. 1297–1304. IEEE Computer Society, Washington, DC, USA (2011). doi:10.1109/CVPR.2011.5995316

  36. Uebersax, D., Gall, J., den Bergh, M.V., Gool, L.J.V.: Real-time sign language letter and word recognition from depth data. IEEE International Conference on Computer Vision Workshops, ICCV, vol. 2011, pp. 383–390 (2011)

  37. Wachs, J.P., Kölsch, M., Stern, H., Edan, Y.: Vision-based hand-gesture applications. Commun. ACM 54(2), 60–71 (2011). doi:10.1145/1897816.1897838

    Google Scholar 

  38. Wan, T., Wang, Y., Li, J.: Hand gesture recognition system using depth data. In: Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on, pp. 1063–1066 (2012). doi:10.1109/CECNet.6201837

  39. Wang, R., Paris, S., Popović, J.: 6d hands: markerless hand-tracking for computer aided design. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST ’11, pp. 549–558. ACM, New York (2011). doi:10.1145/2047196.2047269

  40. Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a \({\$}\)1 recognizer for user interface prototypes. In: Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology, UIST 07, pp. 159–168. ACM, New York (2007). doi:10.1145/1294211.1294238

  41. Zhang, H.J., Kankanhalli, A., Smoliar, S.: Automatic partitioning of full-motion video. Multimedia Syst. 1(1), 10–28 (1993). doi:10.1007/BF01210504

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabrizio Pedersoli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pedersoli, F., Benini, S., Adami, N. et al. XKin: an open source framework for hand pose and gesture recognition using kinect. Vis Comput 30, 1107–1122 (2014). https://doi.org/10.1007/s00371-014-0921-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-014-0921-x

Keywords

Navigation