XKin: an open source framework for hand pose and gesture recognition using kinect

Pedersoli, Fabrizio; Benini, Sergio; Adami, Nicola; Leonardi, Riccardo

doi:10.1007/s00371-014-0921-x

XKin: an open source framework for hand pose and gesture recognition using kinect

Original Article
Published: 31 January 2014

Volume 30, pages 1107–1122, (2014)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Fabrizio Pedersoli¹,
Sergio Benini¹,
Nicola Adami¹ &
…
Riccardo Leonardi¹

2582 Accesses
51 Citations
Explore all metrics

Abstract

This work targets real-time recognition of both static hand-poses and dynamic hand-gestures in a unified open-source framework. The developed solution enables natural and intuitive hand-pose recognition of American Sign Language (ASL), extending the recognition to ambiguous letters not challenged by previous work. While hand-pose recognition exploits techniques working on depth information using texture-based descriptors, gesture recognition evaluates hand trajectories in the depth stream using angular features and hidden Markov models (HMM). Although classifiers come already trained on ASL alphabet and 16 uni-stroke dynamic gestures, users are able to extend these default sets by adding their personalized poses and gestures. The accuracy and robustness of the recognition system have been evaluated using a publicly available database and across many users. The XKin open project is available online (Pedersoli, XKin libraries. https://github.com/fpeder/XKin, 2013) under FreeBSD License for researchers in human–machine interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Issues of 3D Hand Gesture and Posture Recognition Using the Kinect

Multimodal 3D American sign language recognition for static alphabet and numbers using hand joints and shape coding

Article 19 May 2020

HANDS18: Methods, Techniques and Applications for Hand Observation

References

3Gear systems: Gestural user interfaces. http://www.threegear.com/ (2013)
American sign language. http://en.wikipedia.org/wiki/AmericanLanguage (2013)
Gibson Hasbrouck & Associates. http://www.gha-pd.com/ (2013)
Biswas, K., Basu, S.: Gesture recognition using Microsoft Kinect$^{\rm TM}$. In: 2011 5th International Conference on Automation, Robotics and Applications (ICARA), pp. 100–103 (2011). doi:10.1109/ICARA.2011.6144864
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002). doi:10.1109/34.1000236
Article Google Scholar
Daugman, J.G.: Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 2(7), 1160–1169 (1985)
Article Google Scholar
Doliotis, P., Athitsos, V., Kosmopoulos, D.I., Perantonis, S.J.: Hand shape and 3d pose estimation using depth data from a single cluttered frame. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Fowlkes, C., Wang, S., Choi, M.H., Mantler, S., Schulze, J.P., Acevedo, D., Mueller, K., Papka, M.E. (eds.) International Symposium on Visual Computing (ISVC). Springer, Springer (2012)
Google Scholar
Doliotis, P., Stefan, A., McMurrough, C., Eckhard, D., Athitsos, V.: Comparing gesture recognition accuracy using color and depth information. In: Proceedings of the 4th International Conference on Pervasive Technologies Related to Assistive Environments, PETRA ’11, pp. 20:1–20:7. ACM (2011). doi:10.1145/2141622.2141647
Escalera, S., Gonzlez, J., Bar, X., Reyes, M., Lopes, O., Guyon, I., Athitsos, V., Escalante, H.: Multi-modal gesture recognition challenge 2013: Dataset and results. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, p. 445452 (2013). http://dl.acm.org/citation.cfm?id=2532595
Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Escalante, H.: Chalearn gesture challenge: Design and first results. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6 (2012) doi:10.1109/CVPRW.2012.6239178
Keskin, C., Kirac, F., Kara, Y., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: ECCV12, pp. VI: 852–863 (2012)
Le, T.L., Nguyen, V.N., Tran, T.T.H., Nguyen, V.T., Nguyen, T.T.: Temporal gesture segmentation for recognition. In: 2013 International Conference on Computing, Management and Telecommunications (ComManTel), pp. 369–373 (2013). doi:10.1109/ComManTel.6482422
Li, Y.: Hand gesture recognition using Kinect. In: 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS), pp. 196–199 (2012). doi:10.1109/ICSESS.2012.6269439
Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. Vis. Comput. 29(6–8), 837–848 (2013). doi:10.1007/s00371-013-0822-4
Liddel, S., Johnson, R.E.: American sign language—compound formation processes, lexicalization, and phonological remnants. Nat. Lang. Ling. Theory 4, 445–513 (1986)
Google Scholar
Microsoft Kinect for Windows. http://www.microsoft.com/en-us/kinectforwindows (2013)
Mihail, R.P., Jacobs, N., Goldsmith, J.: Real time gesture recognition with 2 Kinect sensors. In: International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV) (2012)
Myers, C.S., Rabiner, L.R.: Comparative Study of several dynamic time-warping algorithms for connected-word recognition. Bell Syst. Tech. J. 60(7) (1981)
Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using Kinect. In: British Machine Vision Conference, pp. 101.1-101.11. British Machine Vision Association (2011). doi:10.5244/C.25.101
Libfreenect. http://openkinect.org (2013)
Standard framework for 3D sensing. http://www.openni.org (2013)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Article MathSciNet Google Scholar
Pedersoli, F.: XKin libraries. https://github.com/fpeder/XKin (2013)
Pedersoli, F., Adami, N., Benini, S., Leonardi, R.: XKin - eXtendable hand pose and gesture recognition library for Kinect. In: Proceedings of ACM Conference on Multimedia 2012—Open Source Competition, Nara (2012)
Peris, M., Fukui, K.: Both-hand gesture recognition based on komsm with volume subspaces for robot teleoperation. In: IEEE-Cyber (2012)
PrimeSense: NiTE. http://www.primesense.com/nite (2013)
PrimeSense: sensing and natural interaction. http://www.primesense.com (2013)
Pugeault, N., Bowden, R.: Spelling it out: real-time asl fingerspelling recognition. IEEE International Conference on Computer Vision Workshops, ICCV, vol. 2011, pp. 1114–1119 (2011)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2), 257–286 (1989). doi:10.1109/5.18626
Ren, Z., Meng, J., Yuan, J., Zhang, Z.: Robust hand gesture recognition with Kinect sensor. In: Proceedings of the 19th ACM international conference on Multimedia, MM ’11, pp. 759–760. ACM (2011). doi:10.1145/2072298.2072443
Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition using kinect sensor. IEEE Trans. Multimedia 15(5), 1110–1120 (2013). doi:10.1109/TMM.2013.2246148
Article Google Scholar
Robot Operating System. http://www.ros.org/wiki/ (2013)
Rubine, D.: Specifying gestures by example. SIGGRAPH Comput. Graph. 25(4), 329–337 (1991). doi:10.1145/127719.122753
${\$}$1 Unistroke Recognizer. http://depts.washington.edu/aimgroup/proj/dollar/ (2013)
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11, pp. 1297–1304. IEEE Computer Society, Washington, DC, USA (2011). doi:10.1109/CVPR.2011.5995316
Uebersax, D., Gall, J., den Bergh, M.V., Gool, L.J.V.: Real-time sign language letter and word recognition from depth data. IEEE International Conference on Computer Vision Workshops, ICCV, vol. 2011, pp. 383–390 (2011)
Wachs, J.P., Kölsch, M., Stern, H., Edan, Y.: Vision-based hand-gesture applications. Commun. ACM 54(2), 60–71 (2011). doi:10.1145/1897816.1897838
Google Scholar
Wan, T., Wang, Y., Li, J.: Hand gesture recognition system using depth data. In: Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on, pp. 1063–1066 (2012). doi:10.1109/CECNet.6201837
Wang, R., Paris, S., Popović, J.: 6d hands: markerless hand-tracking for computer aided design. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST ’11, pp. 549–558. ACM, New York (2011). doi:10.1145/2047196.2047269
Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a ${\$}$1 recognizer for user interface prototypes. In: Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology, UIST 07, pp. 159–168. ACM, New York (2007). doi:10.1145/1294211.1294238
Zhang, H.J., Kankanhalli, A., Smoliar, S.: Automatic partitioning of full-motion video. Multimedia Syst. 1(1), 10–28 (1993). doi:10.1007/BF01210504
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Brescia, via Branze 38, 25125 , Brescia, Italy
Fabrizio Pedersoli, Sergio Benini, Nicola Adami & Riccardo Leonardi

Authors

Fabrizio Pedersoli
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Benini
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Adami
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Leonardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabrizio Pedersoli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pedersoli, F., Benini, S., Adami, N. et al. XKin: an open source framework for hand pose and gesture recognition using kinect. Vis Comput 30, 1107–1122 (2014). https://doi.org/10.1007/s00371-014-0921-x

Download citation

Published: 31 January 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s00371-014-0921-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

XKin: an open source framework for hand pose and gesture recognition using kinect

Abstract

Access this article

Similar content being viewed by others

The Issues of 3D Hand Gesture and Posture Recognition Using the Kinect

Multimodal 3D American sign language recognition for static alphabet and numbers using hand joints and shape coding

HANDS18: Methods, Techniques and Applications for Hand Observation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

XKin: an open source framework for hand pose and gesture recognition using kinect

Abstract

Access this article

Similar content being viewed by others

The Issues of 3D Hand Gesture and Posture Recognition Using the Kinect

Multimodal 3D American sign language recognition for static alphabet and numbers using hand joints and shape coding

HANDS18: Methods, Techniques and Applications for Hand Observation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation