Advertisement

HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization

Conference paper
  • 584 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12356)

Abstract

3D hand reconstruction from images is a widely-studied problem in computer vision and graphics, and has a particularly high relevance for virtual and augmented reality. Although several 3D hand reconstruction approaches leverage hand models as a strong prior to resolve ambiguities and achieve more robust results, most existing models account only for the hand shape and poses and do not model the texture. To fill this gap, in this work we present HTML, the first parametric texture model of human hands. Our model spans several dimensions of hand appearance variability (e.g., related to gender, ethnicity, or age) and only requires a commodity camera for data acquisition. Experimentally, we demonstrate that our appearance model can be used to tackle a range of challenging problems such as 3D hand reconstruction from a single monocular image. Furthermore, our appearance model can be used to define a neural rendering layer that enables training with a self-supervised photometric loss. We make our model publicly available.

Keywords

Hand texture model Appearance modeling Hand tracking 3D hand reconstruction 

Notes

Acknowledgments

The authors thank all participants of the data recordings. This work was supported by the ERC Consolidator Grant 4DRepLy (770784).

Supplementary material

504452_1_En_4_MOESM1_ESM.pdf (2.1 mb)
Supplementary material 1 (pdf 2110 KB)

Supplementary material 2 (mp4 95526 KB)

References

  1. 1.
    Baek, S., In Kim, K., Kim, T.K.: Augmented skeleton space transfer for depth-based hand pose estimation. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  2. 2.
    Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  3. 3.
    Bailer, C., Finckh, M., Lensch, H.P.A.: Scale robust multi view stereo. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 398–411. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33712-3_29CrossRefGoogle Scholar
  4. 4.
    Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. Trans. Pattern Anal. Mach. Intell. (TPAMI) 14(2), 239–256 (1992)CrossRefGoogle Scholar
  5. 5.
    Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH, pp. 187–194 (1999)Google Scholar
  6. 6.
    Boukhayma, A., de Bem, R., Torr, P.H.: 3D hand shape and pose from images in the wild. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  7. 7.
    Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 678–694. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_41CrossRefGoogle Scholar
  8. 8.
    Chen, Y., Tu, Z., Ge, L., Zhang, D., Chen, R., Yuan, J.: SO-HandNet: self-organizing network for 3D hand pose estimation with semi-supervised learning. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  9. 9.
    Dai, H., Pears, N., Smith, W.A., Duncan, C.: A 3D morphable model of craniofacial shape and texture variation. In: International Conference on Computer Vision (ICCV), pp. 3085–3093 (2017)Google Scholar
  10. 10.
    Egger, B., et al.: 3D morphable face models - past, present and future. ACM Trans. Graph. 39, 157:1–157:38 (2020)CrossRefGoogle Scholar
  11. 11.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand PointNet: 3D hand pose estimation using point sets. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  13. 13.
    Gerig, T., et al.: Morphable face models-an open framework. In: International Conference on Automatic Face & Gesture Recognition (FG), pp. 75–82 (2018)Google Scholar
  14. 14.
    Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: LiveCap: real-time human performance capture from monocular video. ACM Trans. Graph. (TOG) 38(2), 1–17 (2019)CrossRefGoogle Scholar
  15. 15.
    Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: HOnnotate: a method for 3D annotation of hand and object poses. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3196–3206 (2020)Google Scholar
  16. 16.
    Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  18. 18.
    Huber, P., et al.: A multiresolution 3D morphable face model and fitting framework. In: International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP) (2016)Google Scholar
  19. 19.
    Jung, S., Hughes, C.: Body ownership in virtual reality. In: International Conference on Collaboration Technologies and Systems (CTS), pp. 597–600 (10 2016)Google Scholar
  20. 20.
    Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  21. 21.
    Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: International Symposium on Mixed and Augmented Reality (ISMAR) (2007)Google Scholar
  22. 22.
    Kovalenko, O., Golyanik, V., Malik, J., Elhayek, A., Stricker, D.: Structure from articulated motion: accurate and stable monocular 3D reconstruction without training data. Sensors 19(20), 4603 (2019)CrossRefGoogle Scholar
  23. 23.
    de La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3D hand pose estimation from monocular video. Trans. Pattern Anal. Mach. Intell. (PAMI) 33(9), 1793–1805 (2011)CrossRefGoogle Scholar
  24. 24.
    de La Gorce, M., Paragios, N., Fleet, D.J.: Model-based hand tracking with texture, shading and self-occlusions. In: Computer Vision and Pattern Recognition (CVPR) (2008)Google Scholar
  25. 25.
    Li, S., Lee, D.: Point-to-pose voting based hand pose estimation using residual permutation equivariant layer. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  26. 26.
    Malik, J., et al.: HandVoxNet: deep voxel-based network for 3D hand shape and pose estimation from a single depth map. In: Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  27. 27.
    Malik, J., et al.: DeepHPS: end-to-end estimation of 3D hand pose and shape by learning from synthetic depth. In: International Conference on 3D Vision (3DV) (2018)Google Scholar
  28. 28.
    Mueller, F., et al.: GANerated hands for real-time 3D hand tracking from monocular RGB. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  29. 29.
    Mueller, F., et al.: Real-time pose and shape reconstruction of two interacting hands with a single depth camera. ACM Trans. Graph. (TOG) 38(4), 1–13 (2019)CrossRefGoogle Scholar
  30. 30.
    Oberweger, M., Wohlhart, P., Lepetit, V.: Training a feedback loop for hand pose estimation. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  31. 31.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Efficient model-based 3D tracking of hand articulations using Kinect. In: British Machine Vision Conference (BMVC) (2011)Google Scholar
  32. 32.
    Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 8024–8035 (2019)Google Scholar
  33. 33.
    Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D face model for pose and illumination invariant face recognition. In: International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 296–301 (2009)Google Scholar
  34. 34.
    Prada, F., Kazhdan, M., Chuang, M., Hoppe, H.: Gradient-domain processing within a texture atlas. ACM Trans. Graph. (TOG) 37(4), 154:1–154:14 (2018)CrossRefGoogle Scholar
  35. 35.
    Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  36. 36.
    Ravi, N., et al.: PyTorch3D (2020). https://github.com/facebookresearch/pytorch3d
  37. 37.
    Remelli, E., Tkach, A., Tagliasacchi, A., Pauly, M.: Low-dimensionality calibration through local anisotropic scaling for robust hand model personalization. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  38. 38.
    Romdhani, S., Vetter, T.: Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. Comput. Vis. Pattern Recogn. (CVPR) 2, 986–993 (2005)Google Scholar
  39. 39.
    Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36(6), 245:1–245:17 (2017)Google Scholar
  40. 40.
    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  41. 41.
    Sharp, T., et al.: Accurate, robust, and flexible real-time hand tracking. In: ACM Conference on Human Factors in Computing Systems (CHI) (2015)Google Scholar
  42. 42.
    Shiffman, M.A., Mirrafati, S., Lam, S.M., Cueteaux, C.G.: Simplified Facial Rejuvenation. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-71097-4CrossRefGoogle Scholar
  43. 43.
    Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  44. 44.
  45. 45.
    Sony Corporation: 3D Creator App (White Paper), version 3: August 2018 (2018). https://dyshcs8wkvd5y.cloudfront.net/docs/3D-Creator-Whitepaper.pdf
  46. 46.
    Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  47. 47.
    Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
  48. 48.
    Taylor, J., et al.: Articulated distance fields for ultra-fast tracking of hands interacting. ACM Trans. Graph. (TOG) 36(6), 1–12 (2017)CrossRefGoogle Scholar
  49. 49.
    Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  50. 50.
    Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2Face: real-time face capture and reenactment of RGB videos. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  51. 51.
    Tkach, A., Pauly, M., Tagliasacchi, A.: Sphere-meshes for real-time hand modeling and tracking. ACM Trans. Graph. (ToG) 35(6), 1–11 (2016)CrossRefGoogle Scholar
  52. 52.
    Tkach, A., Tagliasacchi, A., Remelli, E., Pauly, M., Fitzgibbon, A.: Online generative model personalization for hand tracking. ACM Trans. Graph. (ToG) 36(6), 1–11 (2017)CrossRefGoogle Scholar
  53. 53.
    Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. 33, 169:1–169:10 (2014)CrossRefGoogle Scholar
  54. 54.
    Wan, C., Probst, T., Van Gool, L., Yao, A.: Crossing nets: combining GANs and VAEs with a shared latent space for hand pose estimation. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  55. 55.
    Yang, L., Li, S., Lee, D., Yao, A.: Aligning latent spaces for 3D hand pose estimation. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  56. 56.
    Zhang, X., Li, Q., Mo, H., Zhang, W., Zheng, W.: End-to-end hand mesh recovery from a monocular RGB image. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  57. 57.
    Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  58. 58.
    Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany
  2. 2.RWTH Aachen UniversityAachenGermany
  3. 3.Technical University of MunichMünchenGermany

Personalised recommendations