Multimedia Tools and Applications

, Volume 75, Issue 1, pp 261–279 | Cite as

Kinect-based Taiwanese sign-language recognition system

  • Greg C. Lee
  • Fu-Hao YehEmail author
  • Yi-Han Hsiao


Gesture-recognition is an important component for many intelligent human–computer interaction applications. For example, a realtime sign-language recognition system would detect and interpret hand gestures. Many vision-based sign-language recognition methods have been proposed over the years with mix results of usability. Some system are limited to recognize only a few gestures, while others require the use of 3D camera to provides depth information to improve recognition accuracy. In this paper, a Kinect-based Taiwanese sign-language recognition system is proposed. Three main features are extracted from the signing gestures, namely hand positions, hand signing direction, and hand shapes. The hand positions are readily available through the input sensor. The signing direction is determined using HMM on trajectory of the hand movement, and a SVM is trained and used to recognize the hand shapes. Experimental results show that the proposed system achieved an 85.14 % recognition rate.


Sign-language recognition Gesture recognition Kinect 



This research was partially supported by the Ministry of Science and Technology of Taiwan, R.O.C., under grant numbers 100-2511-S-003-020-MY2 and 101-2511-S-003-057-MY3.


  1. 1.
  2. 2.
    Anant A, Manish KT (2013) Sign language recognition using Microsoft Kinect. Proceedings of the IEEE international conference on contemporary computing, 181–185.sGoogle Scholar
  3. 3.
    Brashear H, Henderson V, Park KH, Hamilton H, Lee S, Starner T (2006) American sign language recognition in game development for deaf children. Proceedings of the ACM international conference on computers and accessibility, 79–86Google Scholar
  4. 4.
    Chang CC, Lin CJ (2011) LIBSVM: a Library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27CrossRefGoogle Scholar
  5. 5.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(7):273–297zbMATHGoogle Scholar
  6. 6.
    Dimitrios K, Anastasios D, Nikolaos D (2005) Gesture-based video summarization. Proc IEEE Int Conf Image Process 3:1220–1223Google Scholar
  7. 7.
    Dreuw P, Rybach D, Deselaers T, Zahedi M, and Ney H (2007) Speech recognition techniques for a sign language recognition system. Interspeech, 2513–2516Google Scholar
  8. 8.
    Feng Z, Xu S, Zhang X, Jin L, Ye Z, Yang W (2012) Real-time fingertip tracking and detection using Kinect depth sensor for a new writing-in-the air system. Proceedings of the ACM international conference on internet multimedia computing and service, 70–74Google Scholar
  9. 9.
    Giovanni G, Pierpaolo M, Alessandro C, Stefano DM et al (2013) White paper on industrial applications of computer vision and pattern recognition. Lect Notes Comput Sci 8157:721–730CrossRefGoogle Scholar
  10. 10.
    Honghai L, Shengyong C, Kubota N (2013) Intelligent video systems and analytics: a survey. IEEE Trans Ind Inform 9(3):1222–1233CrossRefGoogle Scholar
  11. 11.
    Kadous MW (1996) Machine recognition of auslan signs using powergloves: towards large-lexicon recognition of sign language. Proceedings of the workshop on the integration of gesture in language and speech, 165–174Google Scholar
  12. 12.
    Kalin S, Jonas B. (2013) A Kinect corpus of Swedish sign language signs. Proceedings of the workshop on multimodal corpora: beyond audio and video.Google Scholar
  13. 13.
    Kelly D, Delannoy JR, Donald JM, Markham C (2009) A framework for continuous multimodal sign language recognition. Proceedings of the ACM international conference on multimodal interfaces, 351–358Google Scholar
  14. 14.
    Lee B, Cho Y, Cho S (1995) Translation, scale and rotation invariant pattern recognition using principal component analysis (PCA) and reduced second-order neural network. Neural Parallel Sci Comput 3:417–429Google Scholar
  15. 15.
    Leonard EB, Ted P (1966) Statistical inference for probabilistic functions of finite state markov chains. Ann Math Stat 37:1554–1563CrossRefGoogle Scholar
  16. 16.
    Nikolaos D, Anastasios D, Dimitrios K (2005) Content-based decomposition of gesture videos, Proceedings of IEEE international workshop on signal processing systems design and implementation, 319–324Google Scholar
  17. 17.
    Otsu N (1975) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern Syst 9(1):62–66Google Scholar
  18. 18.
    Pugeault N, Bowden R (2011) Spelling it out: real-time ASL fingerspelling recognition. Proceedings of the IEEE international conference on computer vision, 1114–1119Google Scholar
  19. 19.
    Ren Z, Meng J, Yuan J, Zhang Z (2011) Robust hand gesture recognition with Kinect sensor. Proceedings of the ACM international conference on multimedia, 759–760Google Scholar
  20. 20.
    Segen J, Kumar S (1999) Shadow gestures: 3D hand pose estimation using a single camera. Proceedings of the IEEE international conference on computer vision and pattern recognition, 1479–1485Google Scholar
  21. 21.
    Siddiky FA, Alam MS, Ahsan T, Rahim MS (2007) An efficient approach to rotation invariant face detection using PCA, generalized regression neural network and Mahalanobis distance by reducing search space. Proceedings of international conference on computer and information technology, 1–6Google Scholar
  22. 22.
    Simon L, Marco B, Raúl R (2012) Sign language recognition using Kinect. Artif Intell Soft Comput Lect Notes Comput Sci 7267:394–402CrossRefGoogle Scholar
  23. 23.
    Son DT and Larry SD (2008) Event modeling and recognition using Markov logic networks. Proceedings of IEEE European Conference on Computer Vision, 610–623Google Scholar
  24. 24.
    Starner T, Pentland A (1995) Real-time american sign language recognition from video using hidden markov models. Proceedings of the IEEE international conference on computer vision, 265–270Google Scholar
  25. 25.
    Starner T, Weaver J, Pentland A (1998) Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375CrossRefGoogle Scholar
  26. 26.
    Stehman SV (1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ 62(1):77–89CrossRefGoogle Scholar
  27. 27.
    Vogler C, Metaxas D (1998) ASL recognition based on a coupling between HMMs and 3D motion analysis. Proceedings of the IEEE international conference on computer vision, 363–369Google Scholar
  28. 28.
    Yi L (2012) Hand gesture recognition using Kinect. Proceedings of the IEEE international conference on software engineering and service science, 196–199Google Scholar
  29. 29.
    Zafrulla Z, Brashear H, Starner T, Hamilton H, Presti P (2011) American sign language recognition with the kinect. Proceedings of the ACM international conference on multimodal interfaces, 279–286Google Scholar
  30. 30.
    Zhigang M, Yi Y, Zhongwen X, Shuicheng Y, Nicu S, Alexander GH (2013) Complex event detection via multi-source video attributes. Proceedings of the IEEE international conference on computer vision and pattern recognition, 2627–2633Google Scholar
  31. 31.
    Zieren J, Kraiss KF (2004) Non-intrusive sign language recognition for human-computer interaction. Proceedings of the IFAC/IFIP/IFORS/IEA international symposium on analysis, design and evaluation of human machine systemsGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Computer Science and Information EngineeringNational Taiwan Normal UniversityTaipeiRepublic of China
  2. 2.Program of Information TechnologyFooyin UniversityKaohsiungRepublic of China

Personalised recommendations