Advertisement

Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)

Abstract

Estimating 3D hand pose from 2D images is a difficult, inverse problem due to the inherent scale and depth ambiguities. Current state-of-the-art methods train fully supervised deep neural networks with 3D ground-truth data. However, acquiring 3D annotations is expensive, typically requiring calibrated multi-view setups or labour intensive manual annotations. While annotations of 2D keypoints are much easier to obtain, how to efficiently leverage such weakly-supervised data to improve the task of 3D hand pose prediction remains an important open question. The key difficulty stems from the fact that direct application of additional 2D supervision mostly benefits the 2D proxy objective but does little to alleviate the depth and scale ambiguities. Embracing this challenge we propose a set of novel losses that constrain the prediction of a neural network to lie within the range of biomechanically feasible 3D hand configurations. We show by extensive experiments that our proposed constraints significantly reduce the depth ambiguity and allow the network to more effectively leverage additional 2D annotated images. For example, on the challenging freiHAND dataset, using additional 2D annotation without our proposed biomechanical constraints reduces the depth error by only \(15\%\), whereas the error is reduced significantly by \(50\%\) when the proposed biomechanical constraints are used.

Keywords

3D hand pose Weakly-supervised Biomechanical constraints 

Notes

Acknowledgments

We are grateful to Christoph Gebhardt and Shoaib Ahmed Siddiqui for the aid in figure creation and Abhishek Badki for helpful discussions.

Supplementary material

504472_1_En_13_MOESM1_ESM.pdf (1.3 mb)
Supplementary material 1 (pdf 1284 KB)

References

  1. 1.
    Albrecht, I., Haber, J., Seidel, H.P.: Construction and animation of anatomically based human hand models. In: SIGGRAPH (2003)Google Scholar
  2. 2.
    Aristidou, A.: Hand tracking with physiological constraints. Vis. Comput. 34(2), 213–228 (2018).  https://doi.org/10.1007/s00371-016-1327-8CrossRefGoogle Scholar
  3. 3.
    Armagan, A., et al.: Measuring generalisation to unseen viewpoints, articulations, shapes and objects for 3D hand pose estimation under hand-object interaction. In: ECCV (2020)Google Scholar
  4. 4.
    Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: CVPR (2019)Google Scholar
  5. 5.
    Boukhayma, A., de Bem, R., Torr, P.H.: 3D hand shape and pose from images in the wild. In: CVPR (2019)Google Scholar
  6. 6.
    Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 678–694. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_41CrossRefGoogle Scholar
  7. 7.
    Cai, Y., et al.: Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: CVPR (2019)Google Scholar
  8. 8.
    Cerveri, P., De Momi, E., Lopomo, N., Baud-Bovy, G., Barros, R., Ferrigno, G.: Finger kinematic modeling and real-time hand motion estimation. Ann. Biomed. Eng. 35(11), 1989–2002 (2007).  https://doi.org/10.1007/s10439-007-9364-0CrossRefGoogle Scholar
  9. 9.
    Chen Chen, F., Appendino, S., Battezzato, A., Favetto, A., Mousavi, M., Pescarmona, F.: Constraint study for a hand exoskeleton: human hand kinematics and dynamics. J. Robot. (2013) Google Scholar
  10. 10.
    Cobos, S., Ferre, M., Uran, M.S., Ortego, J., Pena, C.: Efficient human hand kinematics for manipulation tasks. In: IROS (2008)Google Scholar
  11. 11.
    Cordella, F., Zollo, L., Guglielmelli, E., Siciliano, B.: A bio-inspired grasp optimization algorithm for an anthropomorphic robotic hand. Int. J. Interact. Des. Manuf. 6(2), 113–122 (2012).  https://doi.org/10.1007/s12008-012-0149-9CrossRefGoogle Scholar
  12. 12.
    Dibra, E., Wolf, T., Oztireli, C., Gross, M.: How to refine 3D hand pose estimation from unlabelled depth data? In: 3DV (2017)Google Scholar
  13. 13.
    Ge, L., et al.: 3D hand shape and pose estimation from a single RGB image. In: CVPR (2019)Google Scholar
  14. 14.
    Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: HOnnotate: a method for 3D annotation of hand and object poses. In: CVPR (2020)Google Scholar
  15. 15.
    Hasson, Y.,et al.: Learning joint reconstruction of hands and manipulated objects. In: CVPR (2019)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  17. 17.
    Heap, T., Hogg, D.: Towards 3D hand tracking using a deformable model. In: FG (1996)Google Scholar
  18. 18.
    Iqbal, U., Molchanov, P., Breuel, T., Gall, J., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 125–143. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01252-6_8CrossRefGoogle Scholar
  19. 19.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)Google Scholar
  20. 20.
    Kuch, J.J., Huang, T.S.: Vision based hand modeling and tracking for virtual teleconferencing and telecollaboration. In: CVPR (1995)Google Scholar
  21. 21.
    Kulon, D., Wang, H., Güler, R.A., Bronstein, M., Zafeiriou, S.: Single image 3D hand reconstruction with mesh convolutions. In: BMVC (2019)Google Scholar
  22. 22.
    Lee, J., Kunii, T.L.: Model-based analysis of hand posture. IEEE Comput. Graph. Appl. 15(5), 77–86 (1995)CrossRefGoogle Scholar
  23. 23.
    Lin, J., Wu, Y., Huang, T.S.: Modeling the constraints of human hand motion. In: IEEE Workshop on Human Motion (2000)Google Scholar
  24. 24.
    Melax, S., Keselman, L., Orsten, S.: Dynamics based 3D skeletal hand tracking. In: ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (2013)Google Scholar
  25. 25.
    Mueller, F., et al.: GANerated hands for real-time 3D hand tracking from monocular RGB. In: CVPR (2018)Google Scholar
  26. 26.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: ICCV (2011)Google Scholar
  27. 27.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Efficient model-based 3D tracking of hand articulations using kinect. In: BMVC (2011)Google Scholar
  28. 28.
    Panteleris, P., Oikonomidis, I., Argyros, A.: Using a single RGB frame for real time 3D hand pose estimation in the wild. In: WACV (2017)Google Scholar
  29. 29.
    Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR (2019)Google Scholar
  30. 30.
    Reed, N.: What is the simplest way to compute principal curvature for a mesh triangle? (2019). https://computergraphics.stackexchange.com/questions/1718/what-is-the-simplest-way-to-compute-principal-curvature-for-a-mesh-triangle
  31. 31.
    Rhee, T., Neumann, U., Lewis, J.P.: Human hand modeling from surface anatomy. In: ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (2006)Google Scholar
  32. 32.
    Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. In: SIGGRAPH-Asia (2017)Google Scholar
  33. 33.
    Ryf, C., Weymann, A.: The neutral zero method–a principle of measuring joint function. Injury 26, 1–11 (1995)CrossRefGoogle Scholar
  34. 34.
    Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: CVPR (2018)Google Scholar
  35. 35.
    Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 294–310. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_19CrossRefGoogle Scholar
  36. 36.
    Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: ICCV (2013)Google Scholar
  37. 37.
    Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: ICCV (2017)Google Scholar
  38. 38.
    Tekin, B., Bogo, F., Pollefeys, M.: H+o: unified egocentric recognition of 3D hand-object poses and interactions. In: CVPR (2019)Google Scholar
  39. 39.
    Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 1–10 (2014)CrossRefGoogle Scholar
  40. 40.
    Wan, C., Probst, T., Gool, L.V., Yao, A.: Self-supervised 3D hand pose estimation through training by fitting. In: CVPR (2019)Google Scholar
  41. 41.
    Wu, Y., Huang, T.S.: Capturing articulated human hand motion: a divide-and-conquer approach. In: ICCV (1999)Google Scholar
  42. 42.
    Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: posing face, body, and hands in the wild. In: CVPR (2019)Google Scholar
  43. 43.
    Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: ICCV (2013)Google Scholar
  44. 44.
    Yang, L., Yao, A.: Disentangling latent hands for image synthesis and pose estimation. In: CVPR (2019)Google Scholar
  45. 45.
    Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: 3D hand pose tracking and estimation using stereo matching. arXiv:1610.07214 (2016)
  46. 46.
    Zhang, X., Li, Q., Mo, H., Zhang, W., Zheng, W.: End-to-end hand mesh recovery from a monocular RGB image. In: ICCV (2019)Google Scholar
  47. 47.
    Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: ICCV (2017)Google Scholar
  48. 48.
    Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV (2017)Google Scholar
  49. 49.
    Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Advanced Interactive TechnologiesETH ZurichZürichSwitzerland
  2. 2.NVIDIASanta ClaraUSA

Personalised recommendations