HandMap: Robust Hand Pose Estimation via Intermediate Dense Guidance Map Supervision

  • Xiaokun Wu
  • Daniel Finnegan
  • Eamonn O’Neill
  • Yong-Liang YangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)


This work presents a novel hand pose estimation framework via intermediate dense guidance map supervision. By leveraging the advantage of predicting heat maps of hand joints in detection-based methods, we propose to use dense feature maps through intermediate supervision in a regression-based framework that is not limited to the resolution of the heat map. Our dense feature maps are delicately designed to encode the hand geometry and the spatial relation between local joint and global hand. The proposed framework significantly improves the state-of-the-art in both 2D and 3D on the recent benchmark datasets.


Hand pose estimation Dense guidance map Intermediate supervision 



We are grateful to the anonymous reviewers for their comments and suggestions. The work was supported by CAMERA, the RCUK Centre for the Analysis of Motion, Entertainment Research and Applications, EP/M023281/1.


  1. 1.
    Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM TOG 33(5), 169:1–169:10 (2014)CrossRefGoogle Scholar
  2. 2.
    Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: IEEE CVPR, pp. 824–832 (2015)Google Scholar
  3. 3.
    Yuan, S., Ye, Q., Stenger, B., Jain, S., Kim, T.K.: Bighand2.2 m benchmark: hand pose dataset and state of the art analysis. In: IEEE CVPR, pp. 2605–2613 (2017)Google Scholar
  4. 4.
    Yuan, S., Ye, Q., Garcia-Hernando, G., Kim, T.K.: The 2017 hands in the million challenge on 3d hand pose estimation. arXiv preprint arXiv:1707.02237 (2017)
  5. 5.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In: IEEE CVPR, pp. 3593–3601 (2016)Google Scholar
  6. 6.
    Sinha, A., Choi, C., Ramani, K.: Deephand: robust hand pose estimation by completing a matrix imputed with deep features. In: CVPR, pp. 4150–4158 (2016)Google Scholar
  7. 7.
    Ye, Q., Yuan, S., Kim, T.K.: Spatial attention deep net with partial pso for hierarchical hybrid hand pose estimation. In: ECCV, pp. 346–361 (2016)Google Scholar
  8. 8.
    Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. IJCA I, 2421–2427 (2016)Google Scholar
  9. 9.
    Guo, H., Wang, G., Chen, X., Zhang, C.: Towards good practices for deep 3d hand pose estimation. CoRR abs/1707.07248 (2017)Google Scholar
  10. 10.
    Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., Yang, H.: Region ensemble network: Improving convolutional network for hand pose estimation. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 4512–4516, September 2017Google Scholar
  11. 11.
    Madadi, M., Escalera, S., Baró, X., Gonzàlez, J.: End-to-end global to local CNN learning for hand pose recovery in depth data. CoRR abs/1705.09606 (2017)Google Scholar
  12. 12.
    Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. CoRR abs/1708.03416 (2017)Google Scholar
  13. 13.
    Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3d hand pose estimation. In: ICCV Workshop, vol. 840, p. 2 (2017)Google Scholar
  14. 14.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: IEEE CVPR, pp. 5679–5688 (2017)Google Scholar
  15. 15.
    Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: ECCV, pp. 852–863 (2012)CrossRefGoogle Scholar
  16. 16.
    Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: IEEE ICCV, pp. 3456–3462 (2013)Google Scholar
  17. 17.
    Tang, D., Chang, H.J., Tejani, A., Kim, T.K.: Latent regression forest: Structured estimation of 3d articulated hand posture. In: IEEE CVPR, pp. 3786–3793 (2014)Google Scholar
  18. 18.
    Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T.K., Shotton, J.: Opening the black box: hierarchical sampling optimization for estimating human hand pose. In: IEEE ICCV, pp. 3325–3333 (2015)Google Scholar
  19. 19.
    Yuan, S., et al.: Depth-based 3d hand pose estimation: From current achievements to future goals. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  20. 20.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV, pp. 483–499 (2016)Google Scholar
  21. 21.
    Ballan, L., Taneja, A., Gall, J., Van Gool, L., Pollefeys, M.: Motion capture of hands in action using discriminative salient points. In: ECCV, pp. 640–653 (2012)Google Scholar
  22. 22.
    Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vis. 118(2), 172–193 (2016)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Tagliasacchi, A., Schröder, M., Tkach, A., Bouaziz, S., Botsch, M., Pauly, M.: Robust articulated-icp for real-time hand tracking. Comput. Graph. Forum (Proc. SGP) 34(5), 101–114 (2015)CrossRefGoogle Scholar
  24. 24.
    Sridhar, S., Mueller, F., Oulasvirta, A., Theobalt, C.: Fast and robust hand tracking using detection-guided optimization. In: IEEE CVPR, pp. 3213–3221 (2015)Google Scholar
  25. 25.
    Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from rgb-d input. In: ECCV, 294–310 (2016)Google Scholar
  26. 26.
    Taylor, J., et al.: Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM TOG (Siggraph) 35(4), 143:1–143:12 (2016)Google Scholar
  27. 27.
    Tkach, A., Pauly, M., Tagliasacchi, A.: Sphere-meshes for real-time hand modeling and tracking. ACM TOG (Siggraph) 35(6), 222:1–222:11 (2016)Google Scholar
  28. 28.
    Iason Oikonomidis, N.K., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect. In: BMVC, pp. 101.1–101.11 (2011)Google Scholar
  29. 29.
    Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: IEEE CVPR, pp. 1106–1113 (2014)Google Scholar
  30. 30.
    Moon, G., Yong Chang, J., Mu Lee, K.: V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  31. 31.
    Lin, M., Lin, L., Liang, X., Wang, K., Chen, H.: Recurrent 3d pose sequence machines. In: CVPR (2017)Google Scholar
  32. 32.
    Popa, A.I., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2d and 3d human sensing. In: IEEE International Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  33. 33.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar
  34. 34.
    Sharp, T., et al.: Accurate, robust, and flexible real-time hand tracking. In: SIGCHI Conference on Human factors in Computing Systems, pp. 3633–3642 (2015)Google Scholar
  35. 35.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  36. 36.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR, 770–778 (2016)Google Scholar
  37. 37.
    Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: IEEE ICCV (2017)Google Scholar
  38. 38.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS, 1799–1807 (2014)Google Scholar
  39. 39.
    Maurer, C.R., Qi, R., Raghavan, V.: A linear time algorithm for computing exact euclidean distance transforms of binary images in arbitrary dimensions. IEEE Trans. Pattern Anal. Mach. Intell. 25(2), 265–270 (2003)CrossRefGoogle Scholar
  40. 40.
    Sethian, J.A.: A fast marching level set method for monotonically advancing fronts. Proc. Natl. Acad. Sci. 93(4), 1591–1595 (1996)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, 1440–1448 (2015)Google Scholar
  42. 42.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12 (2017)Google Scholar
  43. 43.
    Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  44. 44.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  45. 45.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.orgGoogle Scholar
  46. 46.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ArXiv e-prints, December 2014Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiaokun Wu
    • 1
  • Daniel Finnegan
    • 1
  • Eamonn O’Neill
    • 1
  • Yong-Liang Yang
    • 1
    Email author
  1. 1.Department of Computer ScienceUniversity of BathBathUK

Personalised recommendations