A Combined Strategy of Hand Tracking for Desktop VR

  • Shufang LuEmail author
  • Li Cai
  • Xuefeng Ding
  • Fei Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11166)


Desktop VR has been widely used in data analysis and VR movies. One of the important interactions in VR is to capture and track the 3D motion of hands. Although 3D hand pose estimation has been developed for many years, the trade-off between real-time and accuracy still exists. In this paper, we propose a strategy that combines fast model-based method and Convolutional Neural Network (CNN). Based on the occlusion of the hand depth image captured by Intel RealSense Camera, simple gesture images and complex gesture images are recognized by fast model-based method and CNN, respectively. A large number of experimental results demonstrate that our method achieves real-time performance with high accuracy.


Desktop VR 3D hand tracking Computer vision Combined strategy 



This work is supported by the Natural Science Foundation of China (No. 61402410) and Zhejiang Provincial Science and Technology Planning Key Project of China (No. 2018C01064).


  1. 1.
    Zielasko, D., et al.: Remain seated: towards fully-immersive desktop VR. In: 2017 IEEE 3rd Workshop on Everyday Virtual Reality (WEVR). IEEE (2017)Google Scholar
  2. 2.
    Qian, C., et al.: Realtime and robust hand tracking from depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  3. 3.
    Rogez, G., Supancic, J.S., Ramanan, D.: First-person pose recognition using egocentric workspaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  4. 4.
    Kim, D., et al.: Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology. ACM (2012)Google Scholar
  5. 5.
    Dipietro, L., Sabatini, A.M., Dario, P.: A survey of glove-based systems and their applications. IEEE Trans. Syst. Man Cybern. Part C 38(4), 461–482 (2008)CrossRefGoogle Scholar
  6. 6.
    Wang, R.Y., Popović, J.: Real-time hand-tracking with a color glove. ACM Trans. Graph. (TOG) 28(3), 63 (2009)Google Scholar
  7. 7.
    Yuan, S., et al.: Depth-based 3D hand pose estimation: from current achievements to future goals. In: IEEE CVPR (2018)Google Scholar
  8. 8.
    Jang, Y., et al.: 3D finger cape: clicking action and position estimation under self-occlusions in egocentric viewpoint. IEEE Trans. Vis. Comput. Graph. 21(4), 501–510 (2015)CrossRefGoogle Scholar
  9. 9.
    Sharp, T., et al.: Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM (2015)Google Scholar
  10. 10.
    Tang, D., et al.: Opening the black box: hierarchical sampling optimization for estimating human hand pose. In: Proceedings of the IEEE International Conference on Computer Vision (2015)Google Scholar
  11. 11.
    Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: CVPR, vol. 2. no. 3 (2018)Google Scholar
  12. 12.
    Yang, F., Akiyama, K., Wu, Y.: Naist rv’s solution for 2017.
  13. 13.
    Molchanov, P., Kautz, J., Honari, S.: 2017 hand challenge nvresearch and umontreal team.
  14. 14.
    Chen, X., et al.: Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation. arXiv preprint arXiv:1708.03416 (2017)
  15. 15.
    Ge, L., et al.: Hand PointNet: 3D hand pose estimation using point sets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  16. 16.
    Guo, H., et al.: Region ensemble network: improving convolutional network for hand pose estimation. In: 2017 IEEE International Conference on Image Processing (ICIP). IEEE (2017)Google Scholar
  17. 17.
    Leap motion. Accessed 10 Sept 2014
  18. 18.
    Tang, D., et al.: Latent regression forest: structured estimation of 3D articulated hand posture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  19. 19.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Efficient model-based 3D tracking of hand articulations using kinect. In: BmVC, vol. 1, no. 2 (2011)Google Scholar
  20. 20.
    Zhou, X., et al.: Model-based deep hand pose estimation. arXiv preprint arXiv:1606.06854 (2016)
  21. 21.
    Sun, X., et al.: Cascaded hand pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  22. 22.
  23. 23.
  24. 24.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010) (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyZhejiang University of TechnologyHangzhouChina

Personalised recommendations