Multimedia Tools and Applications

, Volume 62, Issue 3, pp 761–783 | Cite as

Human gesture recognition system for TV viewing using time-of-flight camera

  • Masaki TakahashiEmail author
  • Mahito Fujii
  • Masahide Naemura
  • Shin’ichi Satoh


We developed a new device-free user interface for TV viewing that uses a human gesture recognition technique. Although many motion recognition technologies have been reported, no man–machine interface that recognizes a large enough variety of gestures has been developed. The difficulty was the lack of spatial information that could be acquired from normal video sequences. We overcame the difficulty by using a time-of-flight camera and novel action recognition techniques. The main functions of this system are gesture recognition and posture measurement. The former is performed using the bag-of-features approach, which uses key-point trajectories as features. The use of 4-D spatiotemporal trajectory features is the main technical contribution of the proposed system. The latter is obtained through face detection and object tracking technology. The interface is useful because it does not require any contact-type devices. Several experiments proved the effectiveness of our proposed method and the usefulness of the system.


Gesture recognition Time-of-flight Camera Depth information Bag-of-features 


  1. 1.
    Ahad MAR, Ogata T, Tan JK, Kim HS, Ishikawa S (2008) View-based Human Motion Recognition in the Presence of Outliers. Biomed Soft Comput Human Sci 13(1):71–78Google Scholar
  2. 2.
    Appenrodt J, Al-Hamadi A, Michaelis B (2010) Data Gathering for Gesture Recognition Systems Based on Single Color-, Stereo Color- and Thermal Cameras. International Journal of Signal Processing, Image Processing and Pattern Recognition 3(1)Google Scholar
  3. 3.
    Bahar B, Barla IB, Boymul Ö, Dicle Ç, Erol B, Saraçlar M, Sezgin TM, Železný M (2007) Mobile-phone based gesture recognition. Proc. of the eNTERFACE’07 Workshop on Multimodal Interfaces. (Jul. 2007), 139–146Google Scholar
  4. 4.
    Basharat A, Gritai A, Shah M (2009) Learning object motion patterns for anomaly detection and improved object detection. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). 1–8Google Scholar
  5. 5.
    Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Motions as space-time shapes. In Proc. of IEEE Int. Conf. on Computer Vision, Vol. 2. (Oct. 2005), 1395–1402Google Scholar
  6. 6.
    Bradski G, Davis J (2006) Modeling people: Vision-based understanding of a person’s shape, appearance, movement, and action. Comput Vis Image Understand 104:87–89CrossRefGoogle Scholar
  7. 7.
    Chen M, Hauptmann A (2009) MoSIFT: Recognizing human actions in surveillance videos. CMU-CS-09-161. Carnegie Mellon UniversityGoogle Scholar
  8. 8.
    Chen PH, Lin CJ, Schölkopf B (2005) A tutorial on ν-support vector machines. Appl Stoch Model Bus Ind 21:111–136zbMATHCrossRefGoogle Scholar
  9. 9.
    Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. ECCV Workshop on Statistical Learning in Computer Vision. 1–22Google Scholar
  10. 10.
    Fathi A, Mori G (Jun. 2008) 2008. Action recognition by learning mid-level motion features. In Proc. of IEEE Conf. on Computer Vision and, Pattern Recognition, pp 1–8Google Scholar
  11. 11.
    Ganapathi V, Plagemann C, Koller D, Thrun S (2010) Real time motion capture using a single time-of-flight camera. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 755–762Google Scholar
  12. 12.
    Grimble MJ (1994) Robust industrial control: Optimal Design Approach for Polynomial Systems. Prentice Hall 443–456Google Scholar
  13. 13.
    Ikemura S, Fujiyoshi H (2010) Real-time human detection using relational depth similarity features. ACCV 2010. Lecture Notes in Computer Science. Volume 6495/2011, 25–38Google Scholar
  14. 14.
    Laptev I (2005) On Space-Time Interest Points. Int J Comput Vis 64(2/3):107–123CrossRefGoogle Scholar
  15. 15.
    Li Z, Fu Y, Huang TS, Yan S (2008) Real-time human motion recognition by luminance field trajectory analysis. In Proc. of ACM Multimedia. 671–676Google Scholar
  16. 16.
    Matikainen P, Hebert M, Sukthankar R (2009) Trajectons: Action recognition through the motion analysis of tracked features. Workshop on Video-Oriented Object and Event Classification (ICCV), (Sep. 2009)Google Scholar
  17. 17.
    Matikainen P, Hebert M, Sukthankar R (2010) Representing Pairwise Spatial and Temporal Relations for Action Recognition. Proceedings of European Conference on Computer Vision (ECCV)Google Scholar
  18. 18.
    Microsoft, USA. XBOX Kinect. doi:
  19. 19.
    Mikolajczyk K, Uemura H (2008) Motion recognition with motion-appearance vocabulary forest. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
  20. 20.
    Morency L-P, Darrell T (2006) Head gesture recognition in intelligent interfaces: The role of context in improving recognition. Proc. of the 11th International Conference on Intelligent User Interfaces (IUI). (Jan. 2006)Google Scholar
  21. 21.
    Nefian AV, Grzeszczuk R, Eruhimov V (2001) A statistical upper body model for 3D static and dynamic gesture recognition from stereo sequences. In Proc. of International Conference on Pattern Recognition, 2:286–289Google Scholar
  22. 22.
    Nickel K, Stiefelhagen R (2007) Visual recognition of pointing gestures for human-robot interaction. Image Vis Comput 25(12):1875–1884CrossRefGoogle Scholar
  23. 23.
    Nintendo, Japan. Wii Remote Controller. doi:
  24. 24.
    Open CV video library. doi:
  25. 25.
  26. 26.
    Park C, Roh M, Lee S (2008) Real-Time 3D Pointing Gesture Recognition in Mobile Space. IEEE Conference on Automatic Face and Gesture RecognitionGoogle Scholar
  27. 27.
    Plagemann C, Ganapathi V, Koller D, Thrun S (2010) Realtime identification and localization of body parts from depth images. In IEEE Int. Conference on Robotics and Automation (ICRA)Google Scholar
  28. 28.
    Rajesh V, Kumar RR (2009) Hand gestures recognition based on SEMG signal using wavelet and pattern recognition. Int J Recent Trends in Eng 1(4), (May 2009)Google Scholar
  29. 29.
    Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson R C (2001) Estimating the support of a high-dimensional distribution. Neural Computation 13:1443–1471Google Scholar
  30. 30.
    Schuldt C, Laptev I, Caputo B (2004) Recognizing human motions: a local SVM approach. In Proc. of IEEE Int. Conf. on Pattern Recognition, Vol. 3. (Aug. 2004), 32–36Google Scholar
  31. 31.
    Shi J, Tomasi C (1994) Good features to track. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 593–600Google Scholar
  32. 32.
    Shiraki T, Saito H, Kamoshida Y, Ishiguro K, Fukano R, Shirai T, Taura K, Otake M, Sato T, Otsu N (2006) Real-time motion recognition using CHLAC features and cluster. Proc. of IFIP International Conference on Network and Parallel Computing (NPC). 50–56Google Scholar
  33. 33.
    Sillito RR, Fisher RB (2008) Semi-supervised learning for anomalous trajectory detection. Proc BMVC 104–1044Google Scholar
  34. 34.
    Sugawara M (2008) Super Hi-Vision—research on a future ultra-HDTV system. EBU Technical Review Q2Google Scholar
  35. 35.
    Sun X, Chen M-Y, Hauptmann A (2009) Action recognition via local descriptors and holistic features.IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) for Human Communicative Behaviour Analysis. (Jun. 2009), 58–65Google Scholar
  36. 36.
    Valstar M, Pantic M, Patras I (2004) MotionHistory for Facial Action Detection in Video. IEEE Conf. on Systems. Man Cybern 1:635–640Google Scholar
  37. 37.
    Viola P, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
  38. 38.
    Wren CR, Azarbayejani A, Darrell T, Pentland AP (Jul. 1997) Pfinder: Real-Time Tracking of the Human Body. IEEE Trans Pattern Anal Mach Intell 19(7):780–785CrossRefGoogle Scholar
  39. 39.
    Xia L, Chen C-C, Aggarwal JK (2011) Human Detection Using Depth Information by Kinect. Workshop on Human Activity Understanding from 3D Data in Conjunction with CVPR (HAU3D), (Jun. 2011)Google Scholar
  40. 40.
    Yu X, Xu C, Tian Q, Leong HW (2003) A ball tracking framework for broadcast soccer video. In Proc. of IEEE International Conference on Multimedia & Expo (ICME). Vol. II, 273–276Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Masaki Takahashi
    • 1
    Email author
  • Mahito Fujii
    • 1
  • Masahide Naemura
    • 1
  • Shin’ichi Satoh
    • 2
  1. 1.Japan Broadcasting CorporationScience and Technology Research LaboratoriesSetagaya-kuJapan
  2. 2.National Institute of InformaticsChiyoda-kuJapan

Personalised recommendations