Advertisement

HBE: Hand Branch Ensemble Network for Real-Time 3D Hand Pose Estimation

  • Yidan Zhou
  • Jian Lu
  • Kuo Du
  • Xiangbo LinEmail author
  • Yi Sun
  • Xiaohong Ma
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11218)

Abstract

The goal of this paper is to estimate the 3D coordinates of the hand joints from a single depth image. To give consideration to both the accuracy and the real time performance, we design a novel three-branch Convolutional Neural Networks named Hand Branch Ensemble network (HBE), where the three branches correspond to the three parts of a hand: the thumb, the index finger and the other fingers. The structural design inspiration of the HBE network comes from the understanding of the differences in the functional importance of different fingers. In addition, a feature ensemble layer along with a low-dimensional embedding layer ensures the overall hand shape constraints. The experimental results on three public datasets demonstrate that our approach achieves comparable or better performance to state-of-the-art methods with less training data, shorter training time and faster frame rate.

Keywords

Hand pose estimation Depth image Convolutional Neural Networks 

References

  1. 1.
    Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)Google Scholar
  2. 2.
    Braido, P., Zhang, X.: Quantitative analysis of finger motion coordination in hand manipulative and gestic acts. Hum. Mov. Sci. 22(6), 661–678 (2004)CrossRefGoogle Scholar
  3. 3.
    Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. arXiv preprint arXiv:1708.03416 (2017)
  4. 4.
    Cotugno, G., Althoefer, K., Nanayakkara, T.: The role of the thumb: study of finger motion in grasping and reachability space in human and robotic hands. IEEE Trans. Syst. Man Cybern. Syst. 47(7), 1061–1070 (2017)CrossRefGoogle Scholar
  5. 5.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. Comput. Vis. Image Underst. 108(1–2), 52–73 (2007)CrossRefGoogle Scholar
  6. 6.
    Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: CVPR, vol. 1 (2018)Google Scholar
  7. 7.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3d hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3593–3601 (2016)Google Scholar
  8. 8.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 5 (2017)Google Scholar
  9. 9.
    Guo, H., Wang, G., Chen, X., Zhang, C.: Towards good practices for deep 3d hand pose estimation. arXiv preprint arXiv:1707.07248 (2017)
  10. 10.
    Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., Yang, H.: Region ensemble network: improving convolutional network for hand pose estimation. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 4512–4516. IEEE (2017)Google Scholar
  11. 11.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  12. 12.
    Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 852–863. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_61CrossRefGoogle Scholar
  13. 13.
    Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2540–2548 (2015)Google Scholar
  14. 14.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  15. 15.
    Lin, Y., Sun, Y.: Robot grasp planning based on demonstrated grasp strategies. Int. J. Robot. Res. 34(1), 26–42 (2015)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Madadi, M., Escalera, S., Baró, X., Gonzalez, J.: End-to-end global to local CNN learning for hand pose recovery in depth data. arXiv preprint arXiv:1705.09606 (2017)
  17. 17.
    Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: CVPR, vol. 2 (2018)Google Scholar
  18. 18.
    Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proceedings of International Conference on Computer Vision (ICCV) (2017). https://handtracker.mpi-inf.mpg.de/projects/OccludedHands/
  19. 19.
    Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3d hand pose estimation. In: ICCV Workshop, vol. 840, p. 2 (2017)Google Scholar
  20. 20.
    Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. In: Computer Vision Winter Workshop (2015)Google Scholar
  21. 21.
    Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect. In: BMVC (2011)Google Scholar
  22. 22.
    Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1113 (2014)Google Scholar
  23. 23.
    Rad, M., Oberweger, M., Lepetit, V.: Feature mapping for learning fast and accurate 3d pose inference from synthetic images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4663–4672 (2018)Google Scholar
  24. 24.
    Sharp, T., et al.: Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3633–3642. ACM (2015)Google Scholar
  25. 25.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  26. 26.
    Sridhar, S., Mueller, F., Oulasvirta, A., Theobalt, C.: Fast and robust hand tracking using detection-guided optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2015)Google Scholar
  27. 27.
    Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 824–832 (2015)Google Scholar
  28. 28.
    Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depth-based hand pose estimation: data, methods, and challenges. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1868–1876 (2015)Google Scholar
  29. 29.
    Susman, R.L.: Hand function and tool behavior in early hominids. J. Hum. Evol. 35(1), 23–46 (1998)CrossRefGoogle Scholar
  30. 30.
    Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: Structured estimation of 3d articulated hand posture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)Google Scholar
  31. 31.
    Taylor, J., et al.: Articulated distance fields for ultra-fast tracking of hands interacting. ACM Trans. Graph. 36(6), 1–12 (2017)CrossRefGoogle Scholar
  32. 32.
    Tkach, A., Pauly, M., Tagliasacchi, A.: Sphere-meshes for real-time hand modeling and tracking. ACM Trans. Graph. (TOG) 35(6), 222 (2016)CrossRefGoogle Scholar
  33. 33.
    Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 169 (2014)CrossRefGoogle Scholar
  34. 34.
    Wan, C., Probst, T., Van Gool, L., Yao, A.: Crossing nets: combining GANs and VAEs with a shared latent space for hand pose estimation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)Google Scholar
  35. 35.
    Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3456–3462 (2013)Google Scholar
  36. 36.
    Yuan, S., et al.: Depth-based 3d hand pose estimation: from current achievements to future goals. In: IEEE CVPR (2018)Google Scholar
  37. 37.
    Yuan, S., Ye, Q., Garcia-Hernando, G., Kim, T.K.: The 2017 hands in the million challenge on 3d hand pose estimation. arXiv preprint arXiv:1707.02237 (2017)
  38. 38.
    Yuan, S., Ye, Q., Stenger, B., Jain, S., Kim, T.K.: BigHand2. 2M benchmark: Hand pose dataset and state of the art analysis. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2605–2613. IEEE (2017)Google Scholar
  39. 39.
    Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 2421–2427. AAAI Press (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yidan Zhou
    • 1
  • Jian Lu
    • 2
  • Kuo Du
    • 1
  • Xiangbo Lin
    • 1
    Email author
  • Yi Sun
    • 1
  • Xiaohong Ma
    • 1
  1. 1.Dalian University of TechnologyDalianChina
  2. 2.Dalian UniversityDalianChina

Personalised recommendations