HBE: Hand Branch Ensemble Network for Real-Time 3D Hand Pose Estimation
Abstract
The goal of this paper is to estimate the 3D coordinates of the hand joints from a single depth image. To give consideration to both the accuracy and the real time performance, we design a novel three-branch Convolutional Neural Networks named Hand Branch Ensemble network (HBE), where the three branches correspond to the three parts of a hand: the thumb, the index finger and the other fingers. The structural design inspiration of the HBE network comes from the understanding of the differences in the functional importance of different fingers. In addition, a feature ensemble layer along with a low-dimensional embedding layer ensures the overall hand shape constraints. The experimental results on three public datasets demonstrate that our approach achieves comparable or better performance to state-of-the-art methods with less training data, shorter training time and faster frame rate.
Keywords
Hand pose estimation Depth image Convolutional Neural NetworksReferences
- 1.Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)Google Scholar
- 2.Braido, P., Zhang, X.: Quantitative analysis of finger motion coordination in hand manipulative and gestic acts. Hum. Mov. Sci. 22(6), 661–678 (2004)CrossRefGoogle Scholar
- 3.Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. arXiv preprint arXiv:1708.03416 (2017)
- 4.Cotugno, G., Althoefer, K., Nanayakkara, T.: The role of the thumb: study of finger motion in grasping and reachability space in human and robotic hands. IEEE Trans. Syst. Man Cybern. Syst. 47(7), 1061–1070 (2017)CrossRefGoogle Scholar
- 5.Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. Comput. Vis. Image Underst. 108(1–2), 52–73 (2007)CrossRefGoogle Scholar
- 6.Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: CVPR, vol. 1 (2018)Google Scholar
- 7.Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3d hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3593–3601 (2016)Google Scholar
- 8.Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 5 (2017)Google Scholar
- 9.Guo, H., Wang, G., Chen, X., Zhang, C.: Towards good practices for deep 3d hand pose estimation. arXiv preprint arXiv:1707.07248 (2017)
- 10.Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., Yang, H.: Region ensemble network: improving convolutional network for hand pose estimation. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 4512–4516. IEEE (2017)Google Scholar
- 11.Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
- 12.Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 852–863. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_61CrossRefGoogle Scholar
- 13.Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2540–2548 (2015)Google Scholar
- 14.Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
- 15.Lin, Y., Sun, Y.: Robot grasp planning based on demonstrated grasp strategies. Int. J. Robot. Res. 34(1), 26–42 (2015)MathSciNetCrossRefGoogle Scholar
- 16.Madadi, M., Escalera, S., Baró, X., Gonzalez, J.: End-to-end global to local CNN learning for hand pose recovery in depth data. arXiv preprint arXiv:1705.09606 (2017)
- 17.Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: CVPR, vol. 2 (2018)Google Scholar
- 18.Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proceedings of International Conference on Computer Vision (ICCV) (2017). https://handtracker.mpi-inf.mpg.de/projects/OccludedHands/
- 19.Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3d hand pose estimation. In: ICCV Workshop, vol. 840, p. 2 (2017)Google Scholar
- 20.Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. In: Computer Vision Winter Workshop (2015)Google Scholar
- 21.Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect. In: BMVC (2011)Google Scholar
- 22.Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1113 (2014)Google Scholar
- 23.Rad, M., Oberweger, M., Lepetit, V.: Feature mapping for learning fast and accurate 3d pose inference from synthetic images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4663–4672 (2018)Google Scholar
- 24.Sharp, T., et al.: Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3633–3642. ACM (2015)Google Scholar
- 25.Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
- 26.Sridhar, S., Mueller, F., Oulasvirta, A., Theobalt, C.: Fast and robust hand tracking using detection-guided optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2015)Google Scholar
- 27.Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 824–832 (2015)Google Scholar
- 28.Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depth-based hand pose estimation: data, methods, and challenges. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1868–1876 (2015)Google Scholar
- 29.Susman, R.L.: Hand function and tool behavior in early hominids. J. Hum. Evol. 35(1), 23–46 (1998)CrossRefGoogle Scholar
- 30.Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: Structured estimation of 3d articulated hand posture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)Google Scholar
- 31.Taylor, J., et al.: Articulated distance fields for ultra-fast tracking of hands interacting. ACM Trans. Graph. 36(6), 1–12 (2017)CrossRefGoogle Scholar
- 32.Tkach, A., Pauly, M., Tagliasacchi, A.: Sphere-meshes for real-time hand modeling and tracking. ACM Trans. Graph. (TOG) 35(6), 222 (2016)CrossRefGoogle Scholar
- 33.Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 169 (2014)CrossRefGoogle Scholar
- 34.Wan, C., Probst, T., Van Gool, L., Yao, A.: Crossing nets: combining GANs and VAEs with a shared latent space for hand pose estimation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)Google Scholar
- 35.Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3456–3462 (2013)Google Scholar
- 36.Yuan, S., et al.: Depth-based 3d hand pose estimation: from current achievements to future goals. In: IEEE CVPR (2018)Google Scholar
- 37.Yuan, S., Ye, Q., Garcia-Hernando, G., Kim, T.K.: The 2017 hands in the million challenge on 3d hand pose estimation. arXiv preprint arXiv:1707.02237 (2017)
- 38.Yuan, S., Ye, Q., Stenger, B., Jain, S., Kim, T.K.: BigHand2. 2M benchmark: Hand pose dataset and state of the art analysis. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2605–2613. IEEE (2017)Google Scholar
- 39.Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 2421–2427. AAAI Press (2016)Google Scholar