Skip to main content

Hybrid Human Modeling: Making Volumetric Video Animatable

  • Chapter
  • First Online:
Real VR – Immersive Digital Reality

Abstract

Photo-realistic modeling and rendering of humans is extremely important for virtual reality (VR) environments, as the human body and face are highly complex and exhibit large shape variability but also, especially, as humans are extremely sensitive to looking at humans. Further, in VR environments, interactivity plays an important role. While purely computer graphics modeling can achieve highly realistic human models, achieving real photo-realism with these models is computationally extremely expensive. In this chapter, a full end-to-end pipeline for the creation of hybrid representations for human bodies and faces (animatable volumetric video) is investigated, combining classical computer graphics models with image- and video-based as well as example-based approaches: by enriching volumetric video with semantics and animation properties and applying new hybrid geometry- and video-based animation methods, we bring volumetric video to life and combine interactivity with photo-realism. Semantic enrichment and geometric animation ability is achieved by establishing temporal consistency in the 3D data, followed by an automatic rigging of each frame using a parametric shape-adaptive full human body model. For pose editing, we exploit the captured data as much as possible and kinematically deform selected captured frames to fit a desired pose. Further, we treat the face differently from the body in a hybrid geometry- and video-based animation approach, where coarse movements and poses are modeled in the geometry, while very fine and subtle details in the face, often lacking in purely geometric methods, are captured in video-based textures. These are processed to be interactively combined to form new facial expressions. On top of that, we learn the appearance of regions that are challenging to synthesize, such as the teeth or the eyes, and fill in missing regions realistically with an autoencoder-based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. 4D View Solutions. http://www.4dviews.com

  2. 4dviews. http://www.4dviews.com

  3. 8i. http://8i.com

  4. DoubleMe. http://www.doubleme.me

  5. HTC Vive Pro. https://www.vive.com/

  6. Microsoft. http://www.microsoft.com/en-us/mixed-reality/capture-studios

  7. Oculus Rift. https://www.oculus.com/rift/

  8. Uncorporeal. http://uncorporeal.com

  9. Volucap GmbH. http://www.volucap.de

  10. Boukhayma, A., Boyer, E.: Surface motion capture animation synthesis. IEEE Trans. Vis. Comput. Graph. 25(6), 2270–2283 (2019)

    Article  Google Scholar 

  11. Abrevaya, V.F., Wuhrer, S., Boyer, E.: Spatiotemporal modeling for efficient registration of dynamic 3D faces. In: Proceedings of International Conference on 3D Vision (3DV), Verona, Italy, pp. 371–380, September 2018

    Google Scholar 

  12. de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. TOG 27(3), 1–10 (2008)

    Article  Google Scholar 

  13. Alldieck, T., Magnor, M., Bhatnagar, B., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1175–1186, June 2019

    Google Scholar 

  14. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. In: Proceedings of Computer Graphics (SIGGRAPH), vol. 24, August 2005

    Google Scholar 

  15. Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3) (2009). Proceedings of ACM SIGGRAPH

    Google Scholar 

  16. Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of Computer Graphics (SIGGRAPH), SIGGRAPH 1999, pp. 187–194 (1999)

    Google Scholar 

  17. Bogo, F., Black, M.J., Loper, M., Romero, J.: Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In: Proceedings of the International Conference on Computer Vision (ICCV), December 2015

    Google Scholar 

  18. Boukhayma, A., Boyer, E.: Video based animation synthesis with the essential graph. In: Proceedings of the International Conference on 3D Vision (3DV), pp. 478–486. Lyon, France, October 2015

    Google Scholar 

  19. Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: SIG (1997)

    Google Scholar 

  20. Bregler, C., Covell, M., Slaney, M.: Video-based character animation. In: ACM Symposium on Computer Animation (2005)

    Google Scholar 

  21. Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. TVCG 20(3), 413–425 (2014)

    Google Scholar 

  22. Casas, D., Tejera, M., Guillemaut, J.Y., Hilton, A.: 4D parametric motion graphs for interactive animation. In: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 103–110 (2012)

    Google Scholar 

  23. Casas, D., Volino, M., Collomosse, J., Hilton, A.: 4D video textures for interactive character appearance. Comput. Graph. Forum 33(2) (2014). Proceedings of Eurographics

    Google Scholar 

  24. Chan, C., Ginosar, S., Zhou, T., Efros, A.: Everybody dance now. In: Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea, October 2019

    Google Scholar 

  25. Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. 34(4), 69 (2015)

    Article  Google Scholar 

  26. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 681–685 (1998)

    Article  Google Scholar 

  27. Ebel, S., Waizenegger, W., Reinhardt, M., Schreer, O., Feldmann, I.: Visibility-driven patch group generation. In: International Conference on 3D Imaging (IC3D), Liege, Belgium, September 2014

    Google Scholar 

  28. Ebner, T., Feldmann, I., Renault, S., Schreer, O., Eisert, P.: Multi-view reconstruction of dynamic real-world objects and their integration in augmented and virtual reality applications. J. Soc. Inform. Display 25(3), 151–157 (2017)

    Article  Google Scholar 

  29. Fechteler, P., Hilsmann, A., Eisert, P.: Example-based body model optimization and skinning. In: Proceedings of Eurographics 2016, Lisbon, Portugal, May 2016

    Google Scholar 

  30. Fechteler, P., Hilsmann, A., Eisert, P.: Markerless multiview motion capture with 3D shape model adaptation. Comput. Graph. Forum 38(6), 91–109 (2019)

    Article  Google Scholar 

  31. Fechteler, P., Kausch, L., Hilsmann, A., Eisert, P.: Animatable 3D model generation from 2D monocular visual data. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), Athens, Greece, October 2018

    Google Scholar 

  32. Fechteler, P., Paier, W., Hilsmann, A., Eisert, P.: Real-time avatar animation with dynamic face texturing. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, USA, September 2016

    Google Scholar 

  33. Fyffe, G., Jones, A., Alexander, O., Ichikari, R., Debevec, P.: Driving high-resolution facial scans with video performance capture. ACM Trans. Graph. (TOG) 34(1), 1–14 (2014)

    Article  Google Scholar 

  34. Garland, M., Heckbert, P.: Surface simplification using quadric error metrics. In: Proceedings of SIGGRAPH 1997, pp. 209–216, New York, USA, August 1997

    Google Scholar 

  35. Garrido, P., Valgaert, L., Wu, C., Theobalt, C.: Reconstructing detailed dynamic face geometry from monocular video. ACM Trans. Graph. 32(6), 158:1–158:10 (2013)

    Article  Google Scholar 

  36. Goodfellow, I.J., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, pp. 2672–2680 (2014)

    Google Scholar 

  37. Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: LiveCap: real-time human performance capture from monocular video. ACM Trans. Graph. 38(2), 1–17 (2019)

    Article  Google Scholar 

  38. Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. In: Proceedings of the Eurographics. Munich, Germany, April 2009

    Google Scholar 

  39. Hilsmann, A., Eisert, P.: Tracking deformable surfaces with optical flow in the presence of self-occlusions in monocular image sequences. In: CVPR Workshops, Workshop on Non-Rigid Shape Analysis and Deformable Image Alignment (NORDIA), pp. 1–6. IEEE Computer Society, June 2008

    Google Scholar 

  40. Hilsmann, A., Fechteler, P., Eisert, P.: Pose space image-based rendering. Comput. Graph. Forum 32(2), 265–274 (2013). Proceedings of Eurographics 2013

    Article  Google Scholar 

  41. Hilton, P., Hilton, A., Starck, J.: Human motion synthesis from 3D video. In: CVPR (2009)

    Google Scholar 

  42. Huang, Y., Khan, S.M.: DyadGan: generating facial expressions in dyadic interactions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017

    Google Scholar 

  43. Jost, T., Hugli, H.: A multi-resolution ICP with heuristic closest point search for fast and robust 3D registration of range images. In: Proceedings of the 4th International Conference on 3-D Digital Imaging and Modeling (3DIM), pp. 427–433 (2003)

    Google Scholar 

  44. Kavan, L., Sorkine, O.: Elasticity-inspired deformers for character articulation. ACM Trans. Graph. 31(6), 196:1–196:8 (2012). Proceedings of ACM SIGGRAPH ASIA

    Google Scholar 

  45. Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. Trans. Graph. 32(3), 70–78 (2013)

    Article  MATH  Google Scholar 

  46. Kazhdan, M., Bolitho, M., Hoppe, H.: Poisson surface reconstruction. In: Proceedings of the Fourth Eurographics Symposium on Geometry Processing, pp. 61–70 (2006)

    Google Scholar 

  47. Kim, H., et al.: Deep video portraits. ACM Trans. Graph. (TOG) 37(4), 163 (2018)

    Google Scholar 

  48. Kim, M., et al.: Data-driven physics for human soft tissue animation. In: Proceedings of the Computer Graphics (SIGGRAPH), vol. 36, no. 4 (2017)

    Google Scholar 

  49. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2013)

    Google Scholar 

  50. Klaudiny, M., Budd, C., Hilton, A.: Towards optimal non-rigid surface tracking. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 743–756. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_53

    Chapter  Google Scholar 

  51. Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 473–482 (2002)

    Google Scholar 

  52. Li, H., Yu, J., Ye, Y., Bregler, C.: Realtime facial animation with on-the-fly correctives. ACM Trans. Graphic. 32(4), 42:1–42:10 (2013)

    MATH  Google Scholar 

  53. Liu, L., et al.: Neural rendering and reenactment of human actor videos. ACM Trans. Graph. 38, 1–14 (2019)

    Google Scholar 

  54. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  55. Liu, Y., Dai, Q., Xu, W.: A point cloud based multi-view stereo algorithm for free-viewpoint video. IEEE Trans Vis. Comput. Graph. 16, 407–418 (2010)

    Article  Google Scholar 

  56. Malleson, C., et al.: FaceDirector: continuous control of facial performance in video. In: Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile, December 2015

    Google Scholar 

  57. Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36(4), 1–4 (2017). Proceedings of SIGGRAPH 2017

    Article  Google Scholar 

  58. Morgenstern, W., Hilsmann, A., Eisert, P.: Progressive non-rigid registration of temporal mesh sequences. In: Proceedings of the European Conference on Visual Media Production (CVMP), London, UK, December 2019

    Google Scholar 

  59. Murthy, P., Butt, H.T., Hiremath, S., Stricker, D.: Learning 3D joint constraints from vision-based motion capture datasets. IPSJ Trans. Comput. Vis. Appl. 11(1), 1–9 (2019)

    Article  Google Scholar 

  60. Paier, W., Kettern, M., Hilsmann, A., Eisert, P.: Hybrid approach for facial performance analysis and editing. IEEE Trans. Circuits Syst. Video Technol. 27(4), 784–797 (2017)

    Article  Google Scholar 

  61. Paier, W., Kettern, M., Hilsmann, A., Eisert, P.: Video-based facial re-animation. In: Proceedings of the European Conference on Visual Media Production (CVMP), London, UK, November 2015

    Google Scholar 

  62. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June 2019

    Google Scholar 

  63. Pumarola, A., Agudo, A., Martinez, A.M., Sanfeliu, A., Moreno-Noguer, F.: GANimation: anatomically-aware facial animation from a single image. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 835–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_50

    Chapter  Google Scholar 

  64. Regateiro, J., Volino, M., Hilton, A.: Hybrid skeleton driven surface registration for temporally consistent volumetric. In: Proceedings of the International Conference on 3D Vision (3DV), Verona, Italy, September 2018

    Google Scholar 

  65. Schodl, A., Szeliski, R., Salesin, D., Essa, I.: Video textures. In: SIG (2000)

    Google Scholar 

  66. Schreer, O., et al.: Lessons learnt during one year of commercial volumetric video production. In: Proceedings of the IBC Conference, Amsterdam, Netherlands, September 2019

    Google Scholar 

  67. Schreer, O., Feldmann, I., Renault, S., Zepp, M., Eisert, P., Kauff, P.: Capture and 3D video processing of volumetric video. In: Proceedings of IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, September 2019

    Google Scholar 

  68. Serra, J., Cetinaslan, O., Ravikumar, S., Orvalho, V., Cosker, D.: Easy generation of facial animation using motion graphs. In: Computer Graphics Forum (2018)

    Google Scholar 

  69. Sorkine, O.: Differential representations for mesh processing. In: Computer Graphics Forum, vol. 25, December 2006

    Google Scholar 

  70. Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. In: Symposium on Geometry Processing, vol. 4, pp. 109–116 (2007)

    Google Scholar 

  71. Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 21–31 (2007). https://doi.org/10.1109/MCG.2007.68

    Article  Google Scholar 

  72. Stoll, C., Gall, J., de Aguiar, E., Thrun, S., Theobalt, C.: Video-based reconstruction of animatable human characters. ACM Trans. Graph. 29(6), 139–149 (2010). Proceedings of SIGGRAPH ASIA 2010

    Article  Google Scholar 

  73. Sumner, R., Schmid, J., Pauly, M.: Embedded deformation for shape manipulation. ACM Trans. Graph. (TOG) 26, 80 (2007)

    Google Scholar 

  74. Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. ACM Trans. Graphic. 34(6), 183 (2015)

    Article  Google Scholar 

  75. Vlasic, D., Brand, M., Pfister, H., Popović, J.: Face transfer with multilinear models. ACM Trans. Graph. 24(3), 426–433 (2005)

    Article  Google Scholar 

  76. Volino, M., Huang, P., Hilton, A.: Online interactive 4D character animation. In: Proceedings of the International Conference on 3D Web Technology (Web3D), Heraklion, Greece, June 2015

    Google Scholar 

  77. Waechter, M., Moehrle, N., Goesele, M.: Let there be color! large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_54

    Chapter  Google Scholar 

  78. Waizenegger, W., Feldmann, I., Schreer, O., Eisert, P.: Scene flow constrained multi-prior patch-sweeping for real-time upper body 3D reconstruction. In: Proc. IEEE International Conference on Image Processing (ICIP), Melbourne, Australia, September 2013

    Google Scholar 

  79. Waizenegger, W., Feldmann, I., Schreer, O., Kauff, P., Eisert, P.: Real-time 3D body reconstruction for immersive TV. In: Proceedings of IEEE International Conference on Image Processing (ICIP), Phoenix, USA, September 2016

    Google Scholar 

  80. Xu, F., et al.: Video-based characters - creating new human performances from a multiview video database. In: SIG (2011)

    Google Scholar 

  81. Zollhöfer, M., et al.: Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 1–2 (2014)

    Article  Google Scholar 

  82. Zuffi, S., Black, M.J.: The stitched puppet: a graphical model of 3D human shape and pose. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), June 2015

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Eisert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Eisert, P., Hilsmann, A. (2020). Hybrid Human Modeling: Making Volumetric Video Animatable. In: Magnor, M., Sorkine-Hornung, A. (eds) Real VR – Immersive Digital Reality. Lecture Notes in Computer Science(), vol 11900. Springer, Cham. https://doi.org/10.1007/978-3-030-41816-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41816-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41815-1

  • Online ISBN: 978-3-030-41816-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics