Advertisement

Expressive Telepresence via Modular Codec Avatars

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12357)

Abstract

VR telepresence consists of interacting with another human in a virtual space represented by an avatar. Today most avatars are cartoon-like, but soon the technology will allow video-realistic ones. This paper aims in this direction, and presents Modular Codec Avatars (MCA), a method to generate hyper-realistic faces driven by the cameras in the VR headset. MCA extends traditional Codec Avatars (CA) by replacing the holistic models with a learned modular representation. It is important to note that traditional person-specific CAs are learned from few training samples, and typically lack robustness as well as limited expressiveness when transferring facial expressions. MCAs solve these issues by learning a modulated adaptive blending of different facial components as well as an exemplar-based latent alignment. We demonstrate that MCA achieves improved expressiveness and robustness w.r.t to CA in a variety of real-world datasets and practical scenarios. Finally, we showcase new applications in VR telepresence enabled by the proposed model.

Keywords

Virtual reality Telepresence Codec Avatar 

Supplementary material

504453_1_En_20_MOESM1_ESM.pdf (294 kb)
Supplementary material 1 (pdf 294 KB)

Supplementary material 2 (mp4 26829 KB)

References

  1. 1.
    Wei, S.E., et al.: VR facial animation via multiview image translation. In: SIGGRAPH (2019)Google Scholar
  2. 2.
    Heymann, D.L., Shindo, N.: Covid-19: what is next for public health? Lancet 395, 542–545 (2020)CrossRefGoogle Scholar
  3. 3.
    Orts-Escolano, S., et al.: Holoportation: virtual 3D teleportation in real-time. In: UIST (2016)Google Scholar
  4. 4.
    Lombardi, S., Saragih, J., Simon, T., Sheikh, Y.: Deep appearance models for face rendering. In: SIGGRAPH (2018)Google Scholar
  5. 5.
    Tewari, A., et al.: FML: face model learning from videos. In: CVPR (2019)Google Scholar
  6. 6.
    Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: CVPR (2016)Google Scholar
  7. 7.
    Elgharib, M., et al.: Egoface: egocentric face performance capture and videorealistic reenactment. arXiv:1905.10822 (2019)
  8. 8.
    Nagano, K., et al.: PaGAN: real-time avatars using dynamic textures. In: SIGGRAPH (2018)Google Scholar
  9. 9.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. TPAMI 23(6), 681–685 (2001)CrossRefGoogle Scholar
  10. 10.
    Blanz, V., Vetter, T., et al.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH (1999)Google Scholar
  11. 11.
    Tena, J.R., De la Torre, F., Matthews, I.: Interactive region-based linear 3D face models. In: SIGGRAPH (2011)Google Scholar
  12. 12.
    Neumann, T., Varanasi, K., Wenger, S., Wacker, M., Magnor, M., Theobalt, C.: Sparse localized deformation components. TOG 32(6), 1–10 (2013)CrossRefGoogle Scholar
  13. 13.
    Cao, C., Chai, M., Woodford, O., Luo, L.: Stabilized real-time face tracking via a learned dynamic rigidity prior. TOG 37(6), 1–11 (2018)CrossRefGoogle Scholar
  14. 14.
    Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. TVCG 20(3), 413–425 (2013)Google Scholar
  15. 15.
    Ghafourzadeh, D., et al.: Part-based 3D face morphable model with anthropometric local control. In: EuroGraphics (2020)Google Scholar
  16. 16.
    Seyama, J., Nagayama, R.S.: The uncanny valley: effect of realism on the impression of artificial human faces. Presence: Teleoper. Virtual Environ. 16(4), 337–351 (2007)CrossRefGoogle Scholar
  17. 17.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:1312.6114 (2013)
  18. 18.
    Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: learning dynamic renderable volumes from images. TOG 38(4), 65 (2019)CrossRefGoogle Scholar
  19. 19.
    Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. TOG 33(4), 1–10 (2014)Google Scholar
  20. 20.
    Li, H., et al.: Facial performance sensing head-mounted display. TOG 34(4), 1–9 (2015)Google Scholar
  21. 21.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)Google Scholar
  22. 22.
    Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271 (2018)
  23. 23.
    Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2014)Google Scholar
  24. 24.
    Wikipedia: structural similarity. https://en.wikipedia.org/wiki/structural_similarity

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of TorontoTorontoCanada
  2. 2.Vector InstituteTorontoCanada
  3. 3.Facebook Reality LabPittsburghUSA

Personalised recommendations