Abstract
Photo-realistic modeling and rendering of humans is extremely important for virtual reality (VR) environments, as the human body and face are highly complex and exhibit large shape variability but also, especially, as humans are extremely sensitive to looking at humans. Further, in VR environments, interactivity plays an important role. While purely computer graphics modeling can achieve highly realistic human models, achieving real photo-realism with these models is computationally extremely expensive. In this chapter, a full end-to-end pipeline for the creation of hybrid representations for human bodies and faces (animatable volumetric video) is investigated, combining classical computer graphics models with image- and video-based as well as example-based approaches: by enriching volumetric video with semantics and animation properties and applying new hybrid geometry- and video-based animation methods, we bring volumetric video to life and combine interactivity with photo-realism. Semantic enrichment and geometric animation ability is achieved by establishing temporal consistency in the 3D data, followed by an automatic rigging of each frame using a parametric shape-adaptive full human body model. For pose editing, we exploit the captured data as much as possible and kinematically deform selected captured frames to fit a desired pose. Further, we treat the face differently from the body in a hybrid geometry- and video-based animation approach, where coarse movements and poses are modeled in the geometry, while very fine and subtle details in the face, often lacking in purely geometric methods, are captured in video-based textures. These are processed to be interactively combined to form new facial expressions. On top of that, we learn the appearance of regions that are challenging to synthesize, such as the teeth or the eyes, and fill in missing regions realistically with an autoencoder-based approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
4D View Solutions. http://www.4dviews.com
4dviews. http://www.4dviews.com
8i. http://8i.com
DoubleMe. http://www.doubleme.me
HTC Vive Pro. https://www.vive.com/
Microsoft. http://www.microsoft.com/en-us/mixed-reality/capture-studios
Oculus Rift. https://www.oculus.com/rift/
Uncorporeal. http://uncorporeal.com
Volucap GmbH. http://www.volucap.de
Boukhayma, A., Boyer, E.: Surface motion capture animation synthesis. IEEE Trans. Vis. Comput. Graph. 25(6), 2270–2283 (2019)
Abrevaya, V.F., Wuhrer, S., Boyer, E.: Spatiotemporal modeling for efficient registration of dynamic 3D faces. In: Proceedings of International Conference on 3D Vision (3DV), Verona, Italy, pp. 371–380, September 2018
de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. TOG 27(3), 1–10 (2008)
Alldieck, T., Magnor, M., Bhatnagar, B., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1175–1186, June 2019
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. In: Proceedings of Computer Graphics (SIGGRAPH), vol. 24, August 2005
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3) (2009). Proceedings of ACM SIGGRAPH
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of Computer Graphics (SIGGRAPH), SIGGRAPH 1999, pp. 187–194 (1999)
Bogo, F., Black, M.J., Loper, M., Romero, J.: Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In: Proceedings of the International Conference on Computer Vision (ICCV), December 2015
Boukhayma, A., Boyer, E.: Video based animation synthesis with the essential graph. In: Proceedings of the International Conference on 3D Vision (3DV), pp. 478–486. Lyon, France, October 2015
Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: SIG (1997)
Bregler, C., Covell, M., Slaney, M.: Video-based character animation. In: ACM Symposium on Computer Animation (2005)
Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. TVCG 20(3), 413–425 (2014)
Casas, D., Tejera, M., Guillemaut, J.Y., Hilton, A.: 4D parametric motion graphs for interactive animation. In: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 103–110 (2012)
Casas, D., Volino, M., Collomosse, J., Hilton, A.: 4D video textures for interactive character appearance. Comput. Graph. Forum 33(2) (2014). Proceedings of Eurographics
Chan, C., Ginosar, S., Zhou, T., Efros, A.: Everybody dance now. In: Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea, October 2019
Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. 34(4), 69 (2015)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 681–685 (1998)
Ebel, S., Waizenegger, W., Reinhardt, M., Schreer, O., Feldmann, I.: Visibility-driven patch group generation. In: International Conference on 3D Imaging (IC3D), Liege, Belgium, September 2014
Ebner, T., Feldmann, I., Renault, S., Schreer, O., Eisert, P.: Multi-view reconstruction of dynamic real-world objects and their integration in augmented and virtual reality applications. J. Soc. Inform. Display 25(3), 151–157 (2017)
Fechteler, P., Hilsmann, A., Eisert, P.: Example-based body model optimization and skinning. In: Proceedings of Eurographics 2016, Lisbon, Portugal, May 2016
Fechteler, P., Hilsmann, A., Eisert, P.: Markerless multiview motion capture with 3D shape model adaptation. Comput. Graph. Forum 38(6), 91–109 (2019)
Fechteler, P., Kausch, L., Hilsmann, A., Eisert, P.: Animatable 3D model generation from 2D monocular visual data. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), Athens, Greece, October 2018
Fechteler, P., Paier, W., Hilsmann, A., Eisert, P.: Real-time avatar animation with dynamic face texturing. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, USA, September 2016
Fyffe, G., Jones, A., Alexander, O., Ichikari, R., Debevec, P.: Driving high-resolution facial scans with video performance capture. ACM Trans. Graph. (TOG) 34(1), 1–14 (2014)
Garland, M., Heckbert, P.: Surface simplification using quadric error metrics. In: Proceedings of SIGGRAPH 1997, pp. 209–216, New York, USA, August 1997
Garrido, P., Valgaert, L., Wu, C., Theobalt, C.: Reconstructing detailed dynamic face geometry from monocular video. ACM Trans. Graph. 32(6), 158:1–158:10 (2013)
Goodfellow, I.J., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, pp. 2672–2680 (2014)
Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: LiveCap: real-time human performance capture from monocular video. ACM Trans. Graph. 38(2), 1–17 (2019)
Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. In: Proceedings of the Eurographics. Munich, Germany, April 2009
Hilsmann, A., Eisert, P.: Tracking deformable surfaces with optical flow in the presence of self-occlusions in monocular image sequences. In: CVPR Workshops, Workshop on Non-Rigid Shape Analysis and Deformable Image Alignment (NORDIA), pp. 1–6. IEEE Computer Society, June 2008
Hilsmann, A., Fechteler, P., Eisert, P.: Pose space image-based rendering. Comput. Graph. Forum 32(2), 265–274 (2013). Proceedings of Eurographics 2013
Hilton, P., Hilton, A., Starck, J.: Human motion synthesis from 3D video. In: CVPR (2009)
Huang, Y., Khan, S.M.: DyadGan: generating facial expressions in dyadic interactions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017
Jost, T., Hugli, H.: A multi-resolution ICP with heuristic closest point search for fast and robust 3D registration of range images. In: Proceedings of the 4th International Conference on 3-D Digital Imaging and Modeling (3DIM), pp. 427–433 (2003)
Kavan, L., Sorkine, O.: Elasticity-inspired deformers for character articulation. ACM Trans. Graph. 31(6), 196:1–196:8 (2012). Proceedings of ACM SIGGRAPH ASIA
Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. Trans. Graph. 32(3), 70–78 (2013)
Kazhdan, M., Bolitho, M., Hoppe, H.: Poisson surface reconstruction. In: Proceedings of the Fourth Eurographics Symposium on Geometry Processing, pp. 61–70 (2006)
Kim, H., et al.: Deep video portraits. ACM Trans. Graph. (TOG) 37(4), 163 (2018)
Kim, M., et al.: Data-driven physics for human soft tissue animation. In: Proceedings of the Computer Graphics (SIGGRAPH), vol. 36, no. 4 (2017)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2013)
Klaudiny, M., Budd, C., Hilton, A.: Towards optimal non-rigid surface tracking. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 743–756. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_53
Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 473–482 (2002)
Li, H., Yu, J., Ye, Y., Bregler, C.: Realtime facial animation with on-the-fly correctives. ACM Trans. Graphic. 32(4), 42:1–42:10 (2013)
Liu, L., et al.: Neural rendering and reenactment of human actor videos. ACM Trans. Graph. 38, 1–14 (2019)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Liu, Y., Dai, Q., Xu, W.: A point cloud based multi-view stereo algorithm for free-viewpoint video. IEEE Trans Vis. Comput. Graph. 16, 407–418 (2010)
Malleson, C., et al.: FaceDirector: continuous control of facial performance in video. In: Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile, December 2015
Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36(4), 1–4 (2017). Proceedings of SIGGRAPH 2017
Morgenstern, W., Hilsmann, A., Eisert, P.: Progressive non-rigid registration of temporal mesh sequences. In: Proceedings of the European Conference on Visual Media Production (CVMP), London, UK, December 2019
Murthy, P., Butt, H.T., Hiremath, S., Stricker, D.: Learning 3D joint constraints from vision-based motion capture datasets. IPSJ Trans. Comput. Vis. Appl. 11(1), 1–9 (2019)
Paier, W., Kettern, M., Hilsmann, A., Eisert, P.: Hybrid approach for facial performance analysis and editing. IEEE Trans. Circuits Syst. Video Technol. 27(4), 784–797 (2017)
Paier, W., Kettern, M., Hilsmann, A., Eisert, P.: Video-based facial re-animation. In: Proceedings of the European Conference on Visual Media Production (CVMP), London, UK, November 2015
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June 2019
Pumarola, A., Agudo, A., Martinez, A.M., Sanfeliu, A., Moreno-Noguer, F.: GANimation: anatomically-aware facial animation from a single image. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 835–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_50
Regateiro, J., Volino, M., Hilton, A.: Hybrid skeleton driven surface registration for temporally consistent volumetric. In: Proceedings of the International Conference on 3D Vision (3DV), Verona, Italy, September 2018
Schodl, A., Szeliski, R., Salesin, D., Essa, I.: Video textures. In: SIG (2000)
Schreer, O., et al.: Lessons learnt during one year of commercial volumetric video production. In: Proceedings of the IBC Conference, Amsterdam, Netherlands, September 2019
Schreer, O., Feldmann, I., Renault, S., Zepp, M., Eisert, P., Kauff, P.: Capture and 3D video processing of volumetric video. In: Proceedings of IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, September 2019
Serra, J., Cetinaslan, O., Ravikumar, S., Orvalho, V., Cosker, D.: Easy generation of facial animation using motion graphs. In: Computer Graphics Forum (2018)
Sorkine, O.: Differential representations for mesh processing. In: Computer Graphics Forum, vol. 25, December 2006
Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. In: Symposium on Geometry Processing, vol. 4, pp. 109–116 (2007)
Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 21–31 (2007). https://doi.org/10.1109/MCG.2007.68
Stoll, C., Gall, J., de Aguiar, E., Thrun, S., Theobalt, C.: Video-based reconstruction of animatable human characters. ACM Trans. Graph. 29(6), 139–149 (2010). Proceedings of SIGGRAPH ASIA 2010
Sumner, R., Schmid, J., Pauly, M.: Embedded deformation for shape manipulation. ACM Trans. Graph. (TOG) 26, 80 (2007)
Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. ACM Trans. Graphic. 34(6), 183 (2015)
Vlasic, D., Brand, M., Pfister, H., Popović, J.: Face transfer with multilinear models. ACM Trans. Graph. 24(3), 426–433 (2005)
Volino, M., Huang, P., Hilton, A.: Online interactive 4D character animation. In: Proceedings of the International Conference on 3D Web Technology (Web3D), Heraklion, Greece, June 2015
Waechter, M., Moehrle, N., Goesele, M.: Let there be color! large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_54
Waizenegger, W., Feldmann, I., Schreer, O., Eisert, P.: Scene flow constrained multi-prior patch-sweeping for real-time upper body 3D reconstruction. In: Proc. IEEE International Conference on Image Processing (ICIP), Melbourne, Australia, September 2013
Waizenegger, W., Feldmann, I., Schreer, O., Kauff, P., Eisert, P.: Real-time 3D body reconstruction for immersive TV. In: Proceedings of IEEE International Conference on Image Processing (ICIP), Phoenix, USA, September 2016
Xu, F., et al.: Video-based characters - creating new human performances from a multiview video database. In: SIG (2011)
Zollhöfer, M., et al.: Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 1–2 (2014)
Zuffi, S., Black, M.J.: The stitched puppet: a graphical model of 3D human shape and pose. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), June 2015
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Eisert, P., Hilsmann, A. (2020). Hybrid Human Modeling: Making Volumetric Video Animatable. In: Magnor, M., Sorkine-Hornung, A. (eds) Real VR – Immersive Digital Reality. Lecture Notes in Computer Science(), vol 11900. Springer, Cham. https://doi.org/10.1007/978-3-030-41816-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-41816-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41815-1
Online ISBN: 978-3-030-41816-8
eBook Packages: Computer ScienceComputer Science (R0)