Multimedia Tools and Applications

, Volume 76, Issue 20, pp 20423–20455 | Cite as

Hand gesture recognition from depth and infrared Kinect data for CAVE applications interaction

  • Diego Q. Leite
  • Julio C. Duarte
  • Luiz P. Neves
  • Jauvane C. de Oliveira
  • Gilson A. Giraldi


This paper presents a real-time framework that combines depth data and infrared laser speckle pattern (ILSP) images, captured from a Kinect device, for static hand gesture recognition to interact with CAVE applications. At the startup of the system, background removal and hand position detection are performed using only the depth map. After that, tracking is started using the hand positions of the previous frames in order to seek for the hand centroid of the current one. The obtained point is used as a seed for a region growing algorithm to perform hand segmentation in the depth map. The result is a mask that will be used for hand segmentation in the ILSP frame sequence. Next, we apply motion restrictions for gesture spotting in order to mark each image as a ‘Gesture’ or ‘Non-Gesture’. The ILSP counterparts of the frames labeled as “Gesture” are enhanced by using mask subtraction, contrast stretching, median filter, and histogram equalization. The result is used as the input for the feature extraction using a scale invariant feature transform algorithm (SIFT), bag-of-visual-words construction and classification through a multi-class support vector machine (SVM) classifier. Finally, we build a grammar based on the hand gesture classes to convert the classification results in control commands for the CAVE application. The performed tests and comparisons show that the implemented plugin is an efficient solution. We achieve state-of-the-art recognition accuracy as well as efficient object manipulation in a virtual scene visualized in the CAVE.


Kinect Depth and speckle pattern images Gesture spotting Bag-of-visual-words SVM Hand gesture recognition CAVE 


  1. 1.
    Arsićc D., Roalter L, Wöllmer M., Eyben F, Schuller B, Kaiser M, Kranz M, Rigoll G (2010) 3D gesture recognition applying long Short-Term memory and contextual knowledge in a CAVE. In: Proceedings of the 1st ACM international workshop on multimodal pervasive video analysis, MPVA ’10. ACM, pp 33–36Google Scholar
  2. 2.
    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-Up Robust features (SURF). Comput Vis Image Underst 110(3):346–359CrossRefGoogle Scholar
  3. 3.
    Bibby C, Reid ID (2010) Real-time tracking of multiple occluding objects using level sets. In: CVPR, pp. 1307–1314. IEEE Computer SocietyGoogle Scholar
  4. 4.
    Biggs K, Burris M, Stanley M (2014) The complete guide to night vision. Createspace independent pubGoogle Scholar
  5. 5.
    Cai Z, Han J, Liu L, Shao L (2016) RGB-D datasets using microsoft kinect or similar sensors: a survey. Multimedia Tools and Applications:1–43Google Scholar
  6. 6.
    Caputo M, Denker K, Dums B, Umlauf G (2012) 3D hand gesture recognition based on sensor fusion of commodity hardware. In: Reiterer H., Deussen O. (eds) Mensch & Computer, Oldenbourg Verlag, pp 293–302Google Scholar
  7. 7.
    Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM trans. Intell Syst Technol 2(3):1–27CrossRefGoogle Scholar
  8. 8.
    Chaudhary A, Raheja J, Das K, Raheja S (2011) A survey on hand gesture recognition in context of soft computing. In: MeGhanathan N., Kaushik B., Nagamalai D. (eds) Advanced Computing, Communications in Computer and Information Science, vol 133. Springer, Berlin Heidelberg, pp 46–55Google Scholar
  9. 9.
    Corradini A (2001) Dynamic time warping for Off-Line recognition of a small gesture vocabulary. In: Proceedings of the IEEE ICCV workshop on recognition, analysis, and tracking of faces and gestures in real-time systems (RATFG-RTS’01), RATFG-RTS’01. IEEE computer society, Washington, DC, USA, pp 82–Google Scholar
  10. 10.
    Cruz-Neira C, Sandin D, DeFanti T, Kenyon R, Hart J (1992) The CAVE - audio visual experience automatic virtual environment. Commun ACM 35:65–72CrossRefGoogle Scholar
  11. 11.
    Dardas N, Georganas ND (2011) Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans Instrum Meas 60(11):3592–3607CrossRefGoogle Scholar
  12. 12.
    Davis F (1985) A technology acceptance model for empirically testing new end-user information systems: theory and results. Massachusetts institute of technology Sloan school of managementGoogle Scholar
  13. 13.
    de Almeida TV, de Oliveira JC, Rosa P (2012) 3D object handling support system in a CAVE setup. 2011 XIII Symposium on Virtual Reality 0:108–115Google Scholar
  14. 14.
    Dias JMS, Nande P, Barata N, Correia A (2004) OGRE - Open gestures recognition engine 17th Brazilian symposium on computer graphics and image processing, 2004. Proceedings, pp 33–40Google Scholar
  15. 15.
    Elmezain M, Al-Hamadi A, Sadek S, Michaelis B (2010) Robust methods for hand gesture spotting and recognition using hidden Markov models and conditional random fields. In: 2010 IEEE international symposium on signal processing and information technology (ISSPIT), pp 131–136Google Scholar
  16. 16.
    Elmezain M, Hamadi A, Michaelis B (2010) Hand gesture spotting and recognition using HMMs and CRFs in color image sequences. Ph.D. thesis, Otto-von-Guericke-Universitat MagdeburgGoogle Scholar
  17. 17.
    Fosty B, Crispim-Junior C, Badie J, Bremond F, Thonnat M (2013) Event recognition system for older people monitoring using an RGB-d camera. In: 2nd workshop on assistance and service robotics in a human environment (in conjunction with IEEE/IROS). Tokyo, JapanGoogle Scholar
  18. 18.
    Hackenberg G, McCall R, Broll W (2011) Lightweight palm and finger tracking for Real-Time 3D gesture control. In: Virtual reality conference (VR), 2011 IEEE, pp 19–26Google Scholar
  19. 19.
    Hartanto R, Susanto A, Santosa P (2014) Real time static hand gesture recognition system prototype for Indonesian sign language. In: 6th international conference on information technology and electrical engineering (ICITEE), 2014, pp 1–6Google Scholar
  20. 20.
    Hartigan JA (1975) Clustering algorithms. John Wiley & SonsGoogle Scholar
  21. 21.
    Hasan H, Abdul-Kareem S (2014) Static hand gesture recognition using neural networks. Artif Intell Rev 41(2):147–181CrossRefGoogle Scholar
  22. 22.
    Hsieh CC, Liou DH (2012) Novel haar features for Real-Time hand gesture recognition using SVM. J Real-Time Image Proc:1–14Google Scholar
  23. 23.
    Hulik R, Beran V, Spanel M, Krsek P, Smrz P (2012) Fast and accurate plane segmentation in depth maps for indoor scenes. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2012, pp 1665–1670Google Scholar
  24. 24.
    Iason Oikonomidis NK, Argyros A (2011) Efficient Model-Based 3D tracking of hand articulations using kinect. In: Proceedings of the british machine vision conference. BMVA press, pp 101.1–101.11Google Scholar
  25. 25.
    Joo SI, Weon SH, Choi HI (2014) Real-Time Depth-Based Hand detection and tracking. ScientificWorldJournal 2014(284):827Google Scholar
  26. 26.
    Khan NY, McCane B, Wyvill G. (2011) SIFT and SURF performance evaluation against various image deformations on benchmark dataset. In: Proc. of the 2011 int. conf. on digital image computing: Techn. and app., DICTA ’11. USA, Washington, DC, pp 501–506Google Scholar
  27. 27.
    Le VB, Nguyen AT, Zhu Y (2014) Hand detecting and positioning based on depth image of kinect sensor. International Journal of Information and Electronics Engineering 4(3):176–179CrossRefGoogle Scholar
  28. 28.
    Lee U, Tanaka T (2012) Hand controller : Image manipulation interface using fingertips and palm tracking with kinect depth data. In: APCHI ’12: Proceedings Of the 10th asia pacific conference on computer human interaction, pp 705–706Google Scholar
  29. 29.
    Lee H, Tateyama Y, Ogi T (2012) Hand gesture recognition using Blob detection for immersive projection display system. World Acad Sci Eng Technol 6(2):745–748Google Scholar
  30. 30.
    Leite DATQ, Duarte JC, de Oliveira JC, de Almeida Thomaz V, Giraldi G.A. (2014) A system to interact with CAVE applications using hand gesture recognition from depth data. In: SVR 2014, Salvador, Hahia, Brazil, May 12-15, pp 246–253Google Scholar
  31. 31.
    Li Y (2012) Multi-scenario gesture recognition using kinect. 2014 computer games: AI, Animation, Mobile, Multimedia, Educational and Serious Games (CGAMES) 0:126–130Google Scholar
  32. 32.
    Li Q, Zhang H, Guo J, Bhanu B, An L (2013) Reference-Based scheme combined with k-SVD for scene image categorization. IEEE Signal Process Lett 20 (1):67–70CrossRefGoogle Scholar
  33. 33.
    Liang H, Yuan J, Thalmann D. (2012) 3D fingertip and palm tracking in depth image sequences. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12. ACM, New York, NY, USA, pp 785–788Google Scholar
  34. 34.
    Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22 (140):1–55Google Scholar
  35. 35.
    Lin WS, Wu YL, Hung WC, Tang CY (2013) A study of Real-Time hand gesture recognition using SIFT on binary images. In: Pan J. S., Yang C. N., Lin C. C. (eds) Advances in intelligent systems and applications - proceedings of the international computer symposium ICS 2012 held at Hualien, Taiwan, December 12–14, 2012, vol 2. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 235–246Google Scholar
  36. 36.
    Lowe DG (2004) Distinctive image features from Scale-Invariant keypoints. Int Comput Vision 60(2):91–110CrossRefGoogle Scholar
  37. 37.
    Mic V, Zalevsky Z, Garca J, Teicher M, Beiderman Y, Valero E, Garca-Martnez P, Ferreira C (2011) Three-dimensional mapping and ranging of objects using speckle pattern analysis. In: Ferraro P., Wax A., Zalevsky Z. (eds) Coherent Light Microscopy, Springer Series in Surface Sciences, vol 46. Springer, Berlin Heidelberg, pp 347–367Google Scholar
  38. 38.
  39. 39.
    Mitra S, Acharya T (2007) Gesture recognition: a survey. Trans Sys Man Cyber Part C 37(3):311–324CrossRefGoogle Scholar
  40. 40.
    Moehring M, Froehlich B (2011) Natural interaction metaphors for functional validations of virtual car models. IEEE Trans Vis Comput Graph 17(9):1195–1208CrossRefGoogle Scholar
  41. 41.
    Morguet P, Lang M (1998) Spotting dynamic hand gestures in video image sequences using hidden markov models. In: 1998 international conference on image processing, 1998. ICIP 98. Proceedings, vol 3, pp 193–197Google Scholar
  42. 42.
    Nagarajan S, Subashini TS (2013) Article: static hand gesture recognition for sign language alphabets using edge oriented histogram and multi class SVM. Int J Comput Appl 82(4):28–35. Full text availableGoogle Scholar
  43. 43.
    National laboratory for scientific computing ILSP image database.,
  44. 44.
    OpenCV Community OpenCV.
  45. 45.
    Otiniano-Rodríguez K., Chávez G.C. (2013) Finger spelling recognition from RGB-d information using kernel descriptor. In: XXVI Conference on graphics, patterns and images, SIBGRAPI 2013, Arequipa, Peru, August 5-8, 2013, pp 1–7Google Scholar
  46. 46.
    Padam Priyal S, Bora PK (2013) A robust static hand gesture recognition system using geometry based normalizations and krawtchouk moments. Pattern Recogn 46 (8):2202–2219CrossRefzbMATHGoogle Scholar
  47. 47.
    Pansare JR, Bansal M, Saxena S, Desale D (2013) Gestuelle: A system to recognize dynamic hand gestures using hidden Markov model to control windows applications. Int J Comput Appl 62(17):19–24. Published by Foundation of Computer Science, New York, USAGoogle Scholar
  48. 48.
    Papadopoulos GT, Axenopoulos A, Daras P (2014) Real-time skeleton-tracking-based human action recognition using Kinect data. Springer, pp 473–483Google Scholar
  49. 49.
    Pedersoli F, Benini S, Adami N, Leonardi R.: (2014) XKin: an open source framework for hand pose and gesture recognition using Kinect. Vis. Comput. 30 (10):1107–1122CrossRefGoogle Scholar
  50. 50.
    Plouffe G, Cretu AM (2016) Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Trans Instrum Meas 65(2):305–316CrossRefGoogle Scholar
  51. 51.
    Pugeault N, Bowden R (2011) Spelling it out: Real-time ASL Fingerspelling Recognition. In: ICCV Workshops. IEEE, pp 1114–1119Google Scholar
  52. 52.
    Rao G, Satyanarayana C (2013) Visual object target tracking using particle filter: a survey. Int. Journal of Image, Graphics and Sig. Proc. 5(6):57–71MathSciNetCrossRefGoogle Scholar
  53. 53.
    Rao VS, Mahanta C (2006) Gesture based robot control. In: Fourth international conference on intelligent sensing and information processing, 2006. ICISIP 2006, pp 145–148Google Scholar
  54. 54.
    Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43(1):1–54CrossRefGoogle Scholar
  55. 55.
    Ros G, del Rincón J. M., Mateos GG (2012) Articulated particle filter for hand tracking. In: 2012 21st international conference on, pattern recognition (ICPR), pp 3581–3585Google Scholar
  56. 56.
  57. 57.
    Snchez-Nielsen E, Antn-Canals L, Hernndez-Tejera M (2004) Hand gesture recognition for human-machine interaction. In: WSCG, pp 395–402Google Scholar
  58. 58.
    Tara R, Santosa P, Adji T (2012) Hand segmentation from depth image using anthropometric approach in natural interface development. International Journal of Scientific and Engineering Research 3Google Scholar
  59. 59.
    Uddin MZ, Thang ND, Kim TS (2010) Human activity recognition via 3-d joint angle features and hidden markov models. In: ICIP. IEEE, pp 713–716Google Scholar
  60. 60.
    Um D, Ryu D, Kal M (2011) multiple intensity differentiation for 3-D surface reconstruction with Mono-Vision infrared proximity array sensor. IEEE Sensors J 11 (12):3352–3358CrossRefGoogle Scholar
  61. 61.
    Vapnik VN (1998) Statistical learning theory. John Wiley & Sons INCGoogle Scholar
  62. 62.
    Vieriu RL, Mironica I., Goras B.T.: (2013) Background invariant static hand gesture recognition based on hidden Markov models. In: 2013 international symposium on signals, circuits and systems (ISSCS), pp 1–4Google Scholar
  63. 63.
    Vrigkas M, Nikou C, Kakadiaris I (2015) A review of human activity recognition methods. Frontiers in Robotics and AI 2:28CrossRefGoogle Scholar
  64. 64.
    Yang X, Gao X, Tao D, Li X, Li J (2015) An efficient MRF embedded level set method for image segmentation. IEEE Trans Image Processing 24(1):9–21MathSciNetCrossRefGoogle Scholar
  65. 65.
    Yoon HS, Soh J, Bae YJ, Yang HS (2001) Hand gesture recognition using combined features of location, angle and velocity. Pattern Recogn 34(7):1491–1501CrossRefzbMATHGoogle Scholar
  66. 66.
    Yuen KK, Choi SH, Yang XB (2010) A Full-Immersive CAVE-based VR simulation system of Forklift truck operations for safety training. Comput-Aided Des Applic 7(2):235–245CrossRefGoogle Scholar
  67. 67.
    Zhou Y, Benois-pineau J, Nicolas H (2010) A multi-resolution particle filter tracking with a dual consistency check for model update in a multi-camera environment. In: 11Th int. Workshop on image analysis for mult. Interactive services, WIAMIS. Desenzano del Garda, Italy, pp 1–4Google Scholar
  68. 68.
    Zhu Y, Xu G, Kriegman DJ (2002) A Real-Time Approach to the spotting, representation, and recognition of hand gestures for HumanComputer Interaction. Comput Vis Image Underst 85(3):189– 208CrossRefzbMATHGoogle Scholar
  69. 69.
    Zhu HM, Pun CM (2012) Real-time hand gesture recognition from depth image sequences. In: 2012 ninth international conference on computer graphics, imaging and visualization (CGIV), pp 49– 52Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Diego Q. Leite
    • 1
    • 2
  • Julio C. Duarte
    • 2
  • Luiz P. Neves
    • 3
  • Jauvane C. de Oliveira
    • 1
  • Gilson A. Giraldi
    • 1
  1. 1.National Laboratory for Scientific ComputingPetrópolisBrasil
  2. 2.Military Institute of EngineeringRio de JaneiroBrasil
  3. 3.Federal University of ParanáCuritibaBrasil

Personalised recommendations