DeepHandMesh: A Weakly-Supervised Deep Encoder-Decoder Framework for High-Fidelity Hand Mesh Modeling

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)


Human hands play a central role in interacting with other people and objects. For realistic replication of such hand motions, high-fidelity hand meshes have to be reconstructed. In this study, we firstly propose DeepHandMesh, a weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling. We design our system to be trained in an end-to-end and weakly-supervised manner; therefore, it does not require groundtruth meshes. Instead, it relies on weaker supervisions such as 3D joint coordinates and multi-view depth maps, which are easier to get than groundtruth meshes and do not dependent on the mesh topology. Although the proposed DeepHandMesh is trained in a weakly-supervised way, it provides significantly more realistic hand mesh than previous fully-supervised hand models. Our newly introduced penetration avoidance loss further improves results by replicating physical interaction between hand parts. Finally, we demonstrate that our system can also be applied successfully to the 3D hand mesh estimation from general images. Our hand model, dataset, and codes are publicly available(



This work was partially supported by the Next-Generation Information Computing Development Program (NRF-2017M3C4A7069369) and the Visual Turing Test project (IITP-2017-0-01780) funded by the Ministry of Science and ICT of Korea.

Supplementary material

504434_1_En_26_MOESM1_ESM.pdf (1.7 mb)
Supplementary material 1 (pdf 1766 KB)

Supplementary material 2 (mp4 10026 KB)

Supplementary material 3 (mp4 25655 KB)

Supplementary material 4 (mp4 7140 KB)

Supplementary material 5 (mp4 16740 KB)

Supplementary material 6 (mp4 17359 KB)


  1. 1.
    Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: CVPR (2019)Google Scholar
  2. 2.
    Bogo, F., Romero, J., Loper, M., Black, M.J.: FAUST: dataset and evaluation for 3D mesh registration. In: CVPR (2014)Google Scholar
  3. 3.
    Boukhayma, A., de Bem, R., Torr, P.H.: 3D hand shape and pose from images in the wild. In: CVPR (2019)Google Scholar
  4. 4.
    Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: ECCV (2018)Google Scholar
  5. 5.
    Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: ICCV (2015)Google Scholar
  6. 6.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: CVPR (2016)Google Scholar
  7. 7.
    Ge, L., et al.: 3D hand shape and pose estimation from a single RGB image. In: CVPR (2019)Google Scholar
  8. 8.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  9. 9.
    Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: CVPR (2019)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  11. 11.
    Hirshberg, D.A., Loper, M., Rachlin, E., Black, M.J.: Coregistration: simultaneous alignment and modeling of articulated 3D shape. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 242–255. Springer, Heidelberg (2012). Scholar
  12. 12.
    Iqbal, U., Molchanov, P., Breuel Juergen Gall, T., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: ECCV (2018)Google Scholar
  13. 13.
    Jiang, Z.H., Wu, Q., Chen, K., Zhang, J.: Disentangled representation learning for 3D face shape. In: CVPR (2019)Google Scholar
  14. 14.
    Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: CVPR (2018)Google Scholar
  15. 15.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2014)Google Scholar
  16. 16.
    Kulon, D., Wang, H., Güler, R.A., Bronstein, M., Zafeiriou, S.: Single image 3D hand reconstruction with mesh convolutions. In: BMVC (2019)Google Scholar
  17. 17.
    Li, W., et al.: Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148 (2019)
  18. 18.
    Liu, S., Chen, W., Li, T., Li, H.: Soft rasterizer: differentiable rendering for unsupervised single-view mesh reconstruction. In: ICCV (2019)Google Scholar
  19. 19.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM TOG 34(6), 1–6 (2015)CrossRefGoogle Scholar
  20. 20.
    Moon, G., Yong Chang, J., Mu Lee, K.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: CVPR (2018)Google Scholar
  21. 21.
    Moon, G., Yu, S.I., Wen, H., Shiratori, T., Lee, K.M.: InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image. In: ECCV (2020)Google Scholar
  22. 22.
    Mueller, F., et al.: GANerated hands for real-time 3D hand tracking from monocular RGB. In: CVPR (2018)Google Scholar
  23. 23.
    Panteleris, P., Oikonomidis, I., Argyros, A.: Using a single RGB frame for real time 3D hand pose estimation in the wild. In: WACV (2018)Google Scholar
  24. 24.
    Paszke, A., et al.: Automatic differentiation in PyTorch (2017)Google Scholar
  25. 25.
    Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: CVPR (2014)Google Scholar
  26. 26.
    Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM TOG 36(6), 245 (2017)CrossRefGoogle Scholar
  27. 27.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 1153(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Sharp, T., et al.: Accurate, robust, and flexible real-time hand tracking. In: ACM Conference on Human Factors in Computing Systems (2015)Google Scholar
  29. 29.
    Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)Google Scholar
  30. 30.
    Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: CVPR (2018)Google Scholar
  31. 31.
    Tagliasacchi, A., Schröder, M., Tkach, A., Bouaziz, S., Botsch, M., Pauly, M.: Robust articulated-ICP for real-time hand tracking. In: Computer Graphics Forum (2015)Google Scholar
  32. 32.
    Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T.K., Shotton, J.: Opening the black box: hierarchical sampling optimization for estimating human hand pose. In: ICCV (2015)Google Scholar
  33. 33.
    Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM TOG 33(5), 1 (2014)CrossRefGoogle Scholar
  34. 34.
    Wan, C., Probst, T., Gool, L.V., Yao, A.: Self-supervised 3D hand pose estimation through training by fitting. In: CVPR (2019)Google Scholar
  35. 35.
    Yang, L., Yao, A.: Disentangling latent hands for image synthesis and pose estimation. In: CVPR (2019)Google Scholar
  36. 36.
    Yuan, S., Ye, Q., Stenger, B., Jain, S., Kim, T.K.: BigHand2.2M benchmark: hand pose dataset and state of the art analysis. In: CVPR (2017)Google Scholar
  37. 37.
    Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: 3D hand pose tracking and estimation using stereo matching. In: ICIP (2017)Google Scholar
  38. 38.
    Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: IJCAI (2016)Google Scholar
  39. 39.
    Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV (2017)Google Scholar
  40. 40.
    Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.ECE & ASRI, Seoul National UniversitySeoulKorea
  2. 2.Facebook Reality LabsPittsburghUSA

Personalised recommendations