Hybrid Human Modeling: Making Volumetric Video Animatable

Eisert, Peter; Hilsmann, Anna

doi:10.1007/978-3-030-41816-8_7

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11900))

2317 Accesses
1 Citations

Abstract

Photo-realistic modeling and rendering of humans is extremely important for virtual reality (VR) environments, as the human body and face are highly complex and exhibit large shape variability but also, especially, as humans are extremely sensitive to looking at humans. Further, in VR environments, interactivity plays an important role. While purely computer graphics modeling can achieve highly realistic human models, achieving real photo-realism with these models is computationally extremely expensive. In this chapter, a full end-to-end pipeline for the creation of hybrid representations for human bodies and faces (animatable volumetric video) is investigated, combining classical computer graphics models with image- and video-based as well as example-based approaches: by enriching volumetric video with semantics and animation properties and applying new hybrid geometry- and video-based animation methods, we bring volumetric video to life and combine interactivity with photo-realism. Semantic enrichment and geometric animation ability is achieved by establishing temporal consistency in the 3D data, followed by an automatic rigging of each frame using a parametric shape-adaptive full human body model. For pose editing, we exploit the captured data as much as possible and kinematically deform selected captured frames to fit a desired pose. Further, we treat the face differently from the body in a hybrid geometry- and video-based animation approach, where coarse movements and poses are modeled in the geometry, while very fine and subtle details in the face, often lacking in purely geometric methods, are captured in video-based textures. These are processed to be interactively combined to form new facial expressions. On top of that, we learn the appearance of regions that are challenging to synthesize, such as the teeth or the eyes, and fill in missing regions realistically with an autoencoder-based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

4D View Solutions. http://www.4dviews.com
4dviews. http://www.4dviews.com
8i. http://8i.com
DoubleMe. http://www.doubleme.me
HTC Vive Pro. https://www.vive.com/
Microsoft. http://www.microsoft.com/en-us/mixed-reality/capture-studios
Oculus Rift. https://www.oculus.com/rift/
Uncorporeal. http://uncorporeal.com
Volucap GmbH. http://www.volucap.de
Boukhayma, A., Boyer, E.: Surface motion capture animation synthesis. IEEE Trans. Vis. Comput. Graph. 25(6), 2270–2283 (2019)
Article Google Scholar
Abrevaya, V.F., Wuhrer, S., Boyer, E.: Spatiotemporal modeling for efficient registration of dynamic 3D faces. In: Proceedings of International Conference on 3D Vision (3DV), Verona, Italy, pp. 371–380, September 2018
Google Scholar
de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. TOG 27(3), 1–10 (2008)
Article Google Scholar
Alldieck, T., Magnor, M., Bhatnagar, B., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1175–1186, June 2019
Google Scholar
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. In: Proceedings of Computer Graphics (SIGGRAPH), vol. 24, August 2005
Google Scholar
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3) (2009). Proceedings of ACM SIGGRAPH
Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of Computer Graphics (SIGGRAPH), SIGGRAPH 1999, pp. 187–194 (1999)
Google Scholar
Bogo, F., Black, M.J., Loper, M., Romero, J.: Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In: Proceedings of the International Conference on Computer Vision (ICCV), December 2015
Google Scholar
Boukhayma, A., Boyer, E.: Video based animation synthesis with the essential graph. In: Proceedings of the International Conference on 3D Vision (3DV), pp. 478–486. Lyon, France, October 2015
Google Scholar
Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: SIG (1997)
Google Scholar
Bregler, C., Covell, M., Slaney, M.: Video-based character animation. In: ACM Symposium on Computer Animation (2005)
Google Scholar
Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. TVCG 20(3), 413–425 (2014)
Google Scholar
Casas, D., Tejera, M., Guillemaut, J.Y., Hilton, A.: 4D parametric motion graphs for interactive animation. In: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 103–110 (2012)
Google Scholar
Casas, D., Volino, M., Collomosse, J., Hilton, A.: 4D video textures for interactive character appearance. Comput. Graph. Forum 33(2) (2014). Proceedings of Eurographics
Google Scholar
Chan, C., Ginosar, S., Zhou, T., Efros, A.: Everybody dance now. In: Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea, October 2019
Google Scholar
Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. 34(4), 69 (2015)
Article Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 681–685 (1998)
Article Google Scholar
Ebel, S., Waizenegger, W., Reinhardt, M., Schreer, O., Feldmann, I.: Visibility-driven patch group generation. In: International Conference on 3D Imaging (IC3D), Liege, Belgium, September 2014
Google Scholar
Ebner, T., Feldmann, I., Renault, S., Schreer, O., Eisert, P.: Multi-view reconstruction of dynamic real-world objects and their integration in augmented and virtual reality applications. J. Soc. Inform. Display 25(3), 151–157 (2017)
Article Google Scholar
Fechteler, P., Hilsmann, A., Eisert, P.: Example-based body model optimization and skinning. In: Proceedings of Eurographics 2016, Lisbon, Portugal, May 2016
Google Scholar
Fechteler, P., Hilsmann, A., Eisert, P.: Markerless multiview motion capture with 3D shape model adaptation. Comput. Graph. Forum 38(6), 91–109 (2019)
Article Google Scholar
Fechteler, P., Kausch, L., Hilsmann, A., Eisert, P.: Animatable 3D model generation from 2D monocular visual data. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), Athens, Greece, October 2018
Google Scholar
Fechteler, P., Paier, W., Hilsmann, A., Eisert, P.: Real-time avatar animation with dynamic face texturing. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, USA, September 2016
Google Scholar
Fyffe, G., Jones, A., Alexander, O., Ichikari, R., Debevec, P.: Driving high-resolution facial scans with video performance capture. ACM Trans. Graph. (TOG) 34(1), 1–14 (2014)
Article Google Scholar
Garland, M., Heckbert, P.: Surface simplification using quadric error metrics. In: Proceedings of SIGGRAPH 1997, pp. 209–216, New York, USA, August 1997
Google Scholar
Garrido, P., Valgaert, L., Wu, C., Theobalt, C.: Reconstructing detailed dynamic face geometry from monocular video. ACM Trans. Graph. 32(6), 158:1–158:10 (2013)
Article Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, pp. 2672–2680 (2014)
Google Scholar
Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: LiveCap: real-time human performance capture from monocular video. ACM Trans. Graph. 38(2), 1–17 (2019)
Article Google Scholar
Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. In: Proceedings of the Eurographics. Munich, Germany, April 2009
Google Scholar
Hilsmann, A., Eisert, P.: Tracking deformable surfaces with optical flow in the presence of self-occlusions in monocular image sequences. In: CVPR Workshops, Workshop on Non-Rigid Shape Analysis and Deformable Image Alignment (NORDIA), pp. 1–6. IEEE Computer Society, June 2008
Google Scholar
Hilsmann, A., Fechteler, P., Eisert, P.: Pose space image-based rendering. Comput. Graph. Forum 32(2), 265–274 (2013). Proceedings of Eurographics 2013
Article Google Scholar
Hilton, P., Hilton, A., Starck, J.: Human motion synthesis from 3D video. In: CVPR (2009)
Google Scholar
Huang, Y., Khan, S.M.: DyadGan: generating facial expressions in dyadic interactions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017
Google Scholar
Jost, T., Hugli, H.: A multi-resolution ICP with heuristic closest point search for fast and robust 3D registration of range images. In: Proceedings of the 4th International Conference on 3-D Digital Imaging and Modeling (3DIM), pp. 427–433 (2003)
Google Scholar
Kavan, L., Sorkine, O.: Elasticity-inspired deformers for character articulation. ACM Trans. Graph. 31(6), 196:1–196:8 (2012). Proceedings of ACM SIGGRAPH ASIA
Google Scholar
Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. Trans. Graph. 32(3), 70–78 (2013)
Article MATH Google Scholar
Kazhdan, M., Bolitho, M., Hoppe, H.: Poisson surface reconstruction. In: Proceedings of the Fourth Eurographics Symposium on Geometry Processing, pp. 61–70 (2006)
Google Scholar
Kim, H., et al.: Deep video portraits. ACM Trans. Graph. (TOG) 37(4), 163 (2018)
Google Scholar
Kim, M., et al.: Data-driven physics for human soft tissue animation. In: Proceedings of the Computer Graphics (SIGGRAPH), vol. 36, no. 4 (2017)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2013)
Google Scholar
Klaudiny, M., Budd, C., Hilton, A.: Towards optimal non-rigid surface tracking. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 743–756. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_53
Chapter Google Scholar
Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 473–482 (2002)
Google Scholar
Li, H., Yu, J., Ye, Y., Bregler, C.: Realtime facial animation with on-the-fly correctives. ACM Trans. Graphic. 32(4), 42:1–42:10 (2013)
MATH Google Scholar
Liu, L., et al.: Neural rendering and reenactment of human actor videos. ACM Trans. Graph. 38, 1–14 (2019)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y., Dai, Q., Xu, W.: A point cloud based multi-view stereo algorithm for free-viewpoint video. IEEE Trans Vis. Comput. Graph. 16, 407–418 (2010)
Article Google Scholar
Malleson, C., et al.: FaceDirector: continuous control of facial performance in video. In: Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile, December 2015
Google Scholar
Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36(4), 1–4 (2017). Proceedings of SIGGRAPH 2017
Article Google Scholar
Morgenstern, W., Hilsmann, A., Eisert, P.: Progressive non-rigid registration of temporal mesh sequences. In: Proceedings of the European Conference on Visual Media Production (CVMP), London, UK, December 2019
Google Scholar
Murthy, P., Butt, H.T., Hiremath, S., Stricker, D.: Learning 3D joint constraints from vision-based motion capture datasets. IPSJ Trans. Comput. Vis. Appl. 11(1), 1–9 (2019)
Article Google Scholar
Paier, W., Kettern, M., Hilsmann, A., Eisert, P.: Hybrid approach for facial performance analysis and editing. IEEE Trans. Circuits Syst. Video Technol. 27(4), 784–797 (2017)
Article Google Scholar
Paier, W., Kettern, M., Hilsmann, A., Eisert, P.: Video-based facial re-animation. In: Proceedings of the European Conference on Visual Media Production (CVMP), London, UK, November 2015
Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June 2019
Google Scholar
Pumarola, A., Agudo, A., Martinez, A.M., Sanfeliu, A., Moreno-Noguer, F.: GANimation: anatomically-aware facial animation from a single image. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 835–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_50
Chapter Google Scholar
Regateiro, J., Volino, M., Hilton, A.: Hybrid skeleton driven surface registration for temporally consistent volumetric. In: Proceedings of the International Conference on 3D Vision (3DV), Verona, Italy, September 2018
Google Scholar
Schodl, A., Szeliski, R., Salesin, D., Essa, I.: Video textures. In: SIG (2000)
Google Scholar
Schreer, O., et al.: Lessons learnt during one year of commercial volumetric video production. In: Proceedings of the IBC Conference, Amsterdam, Netherlands, September 2019
Google Scholar
Schreer, O., Feldmann, I., Renault, S., Zepp, M., Eisert, P., Kauff, P.: Capture and 3D video processing of volumetric video. In: Proceedings of IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, September 2019
Google Scholar
Serra, J., Cetinaslan, O., Ravikumar, S., Orvalho, V., Cosker, D.: Easy generation of facial animation using motion graphs. In: Computer Graphics Forum (2018)
Google Scholar
Sorkine, O.: Differential representations for mesh processing. In: Computer Graphics Forum, vol. 25, December 2006
Google Scholar
Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. In: Symposium on Geometry Processing, vol. 4, pp. 109–116 (2007)
Google Scholar
Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 21–31 (2007). https://doi.org/10.1109/MCG.2007.68
Article Google Scholar
Stoll, C., Gall, J., de Aguiar, E., Thrun, S., Theobalt, C.: Video-based reconstruction of animatable human characters. ACM Trans. Graph. 29(6), 139–149 (2010). Proceedings of SIGGRAPH ASIA 2010
Article Google Scholar
Sumner, R., Schmid, J., Pauly, M.: Embedded deformation for shape manipulation. ACM Trans. Graph. (TOG) 26, 80 (2007)
Google Scholar
Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. ACM Trans. Graphic. 34(6), 183 (2015)
Article Google Scholar
Vlasic, D., Brand, M., Pfister, H., Popović, J.: Face transfer with multilinear models. ACM Trans. Graph. 24(3), 426–433 (2005)
Article Google Scholar
Volino, M., Huang, P., Hilton, A.: Online interactive 4D character animation. In: Proceedings of the International Conference on 3D Web Technology (Web3D), Heraklion, Greece, June 2015
Google Scholar
Waechter, M., Moehrle, N., Goesele, M.: Let there be color! large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_54
Chapter Google Scholar
Waizenegger, W., Feldmann, I., Schreer, O., Eisert, P.: Scene flow constrained multi-prior patch-sweeping for real-time upper body 3D reconstruction. In: Proc. IEEE International Conference on Image Processing (ICIP), Melbourne, Australia, September 2013
Google Scholar
Waizenegger, W., Feldmann, I., Schreer, O., Kauff, P., Eisert, P.: Real-time 3D body reconstruction for immersive TV. In: Proceedings of IEEE International Conference on Image Processing (ICIP), Phoenix, USA, September 2016
Google Scholar
Xu, F., et al.: Video-based characters - creating new human performances from a multiview video database. In: SIG (2011)
Google Scholar
Zollhöfer, M., et al.: Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 1–2 (2014)
Article Google Scholar
Zuffi, S., Black, M.J.: The stitched puppet: a graphical model of 3D human shape and pose. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), June 2015
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer HHI, Berlin, Germany
Peter Eisert & Anna Hilsmann

Authors

Peter Eisert
View author publications
You can also search for this author in PubMed Google Scholar
Anna Hilsmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Eisert .

Editor information

Editors and Affiliations

TU Braunschweig, Brunswick, Germany
Marcus Magnor
Facebook Zurich, Zürich, Switzerland
Alexander Sorkine-Hornung

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Eisert, P., Hilsmann, A. (2020). Hybrid Human Modeling: Making Volumetric Video Animatable. In: Magnor, M., Sorkine-Hornung, A. (eds) Real VR – Immersive Digital Reality. Lecture Notes in Computer Science(), vol 11900. Springer, Cham. https://doi.org/10.1007/978-3-030-41816-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-41816-8_7
Published: 03 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41815-1
Online ISBN: 978-3-030-41816-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics