Advertisement

Reconstructing 3D Human Avatars from Monocular Images

  • Thiemo Alldieck
  • Moritz Kappel
  • Susana CastilloEmail author
  • Marcus Magnor
Chapter
  • 97 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11900)

Abstract

Creating convincing representations of humans is a fundamental problem in both traditional arts and modern media. In our digital world, virtual avatars allow us to simulate and render the human body for a variety of applications, including movie production, sports, human-computer interaction, and medical sciences. However, capturing digital representations of a person’s shape, appearance, and motion is an expensive and time-consuming process which usually requires a lot of manual adjustments.

With the advances in consumer-grade virtual reality devices, personalized virtual avatars became an essential part of interactive and immersive applications like telepresence and virtual try-on for online fashion shopping, thereby increasing the need for versatile easy-to-use self-digitization.

In this chapter, we discuss a selection of recent acquisition methods for personalized human avatar reconstruction. In contrast to conventional setups, these fully-automatic approaches only use low-cost monocular video cameras to effectively fuse information from multiple points in time and realistically complete reconstructions from sparse observations. We address both straight-forward and sophisticated reconstruction methods focused on accuracy, simplicity, and usability to compare and provide insights into their visual fidelity and robustness.

Keywords

Human modeling Cameras Image reconstruction Three-dimensional displays 

References

  1. 1.
    Ahmed, N., de Aguiar, E., Theobalt, C., Magnor, M., Seidel, H.P.: Automatic generation of personalized human avatars from multi-view video. In: Proceedings of the ACM Symposium on Virtual Reality Software and Technology, pp. 257–260. ACM (2005)Google Scholar
  2. 2.
    Aliev, K.A., Ulyanov, D., Lempitsky, V.: Neural point-based graphics. arXiv preprint arXiv:1906.08240 (2019)
  3. 3.
    Allain, B., Franco, J.S., Boyer, E.: An efficient volumetric framework for shape tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 268–276. IEEE (2015)Google Scholar
  4. 4.
    Alldieck, T., Kassubeck, M., Wandt, B., Rosenhahn, B., Magnor, M.: Optical flow-based 3D human motion estimation from monocular video. In: Roth, V., Vetter, T. (eds.) GCPR 2017. LNCS, vol. 10496, pp. 347–360. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-66709-6_28CrossRefGoogle Scholar
  5. 5.
    Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1186. IEEE (2019)Google Scholar
  6. 6.
    Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Detailed human avatars from monocular video. In: International Conference on 3D Vision, pp. 98–109. IEEE (2018)Google Scholar
  7. 7.
    Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8387–8397. IEEE (2018)Google Scholar
  8. 8.
    Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2Shape: detailed full human body geometry from a single image. In: IEEE International Conference on Computer Vision. IEEE (2019)Google Scholar
  9. 9.
    Allen, B., Curless, B., Curless, B., Popović, Z.: The space of human body shapes: reconstruction and parameterization from range scans. ACM Trans. Graph. 22(3), 587–594 (2003)CrossRefGoogle Scholar
  10. 10.
    Allen, B., Curless, B., Popović, Z., Hertzmann, A.: Learning a correlated model of identity and pose-dependent body shape variation for real-time synthesis. In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 147–156 (2006)Google Scholar
  11. 11.
    Alp Güler, R., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7297–7306. IEEE (2018)Google Scholar
  12. 12.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. 24(3), 408–416 (2005)CrossRefGoogle Scholar
  13. 13.
    Bălan, A.O., Black, M.J.: The naked truth: estimating body shape under clothing. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 15–29. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88688-4_2CrossRefGoogle Scholar
  14. 14.
    Bălan, A.O., Sigal, L., Black, M.J., Davis, J.E., Haussecker, H.W.: Detailed human shape and pose from images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar
  15. 15.
    Blinn, J.F., Newell, M.E.: Texture and reflection in computer generated images. Commun. ACM 19(10), 542–547 (1976)CrossRefGoogle Scholar
  16. 16.
    Bogo, F., Black, M.J., Loper, M., Romero, J.: Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In: IEEE International Conference on Computer Vision, pp. 2300–2308. IEEE (2015)Google Scholar
  17. 17.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_34CrossRefGoogle Scholar
  18. 18.
    Bogo, F., Romero, J., Pons-Moll, G., Black, M.J.: Dynamic FAUST: registering human bodies in motion. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)Google Scholar
  19. 19.
    Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond Euclidean data. IEEE Sign. Process. Mag. 34, 18–42 (2017)CrossRefGoogle Scholar
  20. 20.
    Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)Google Scholar
  21. 21.
    Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: FaceWarehouse: a 3D facial expression database for visual computing. IEEE Trans. Visual. Comput. Graph. 20(3), 413–425 (2013)Google Scholar
  22. 22.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)Google Scholar
  23. 23.
    Carranza, J., Theobalt, C., Magnor, M.A., Seidel, H.P.: Free-viewpoint video of human actors. ACM Trans. Graph. 22(3), 569–577 (2003)CrossRefGoogle Scholar
  24. 24.
    Chen, X., Guo, Y., Zhou, B., Zhao, Q.: Deformable model for estimating clothed and naked human shapes from a single image. Vis. Comput. 29(11), 1187–1196 (2013)CrossRefGoogle Scholar
  25. 25.
    Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)Google Scholar
  26. 26.
    Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. 34(4), 69 (2015)CrossRefGoogle Scholar
  27. 27.
    Cui, Y., Chang, W., Nöll, T., Stricker, D.: KinectAvatar: fully automatic body capture using a single kinect. In: Park, J.-I., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7729, pp. 133–147. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37484-5_12 CrossRefGoogle Scholar
  28. 28.
    De Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. ACM Trans. Graph. 27(3), 98 (2008)CrossRefGoogle Scholar
  29. 29.
    De Aguiar, E., Theobalt, C., Magnor, M., Seidel, H.P., et al.: Reconstructing human shape and motion from multi-view video. In: 2nd European Conference on Visual Media Production (CVMP), pp. 42–49 (2005)Google Scholar
  30. 30.
    Dibra, E., Jain, H., Öztireli, C., Ziegler, R., Gross, M.: HS-Nets: estimating human body shape from silhouettes with convolutional neural networks. In: International Conference on 3D Vision, pp. 108–117. IEEE (2016)Google Scholar
  31. 31.
    Dibra, E., Jain, H., Öztireli, C., Ziegler, R., Gross, M.: Human shape from silhouettes using generative HKS descriptors and cross-modal neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)Google Scholar
  32. 32.
    Dibra, E., Öztireli, C., Ziegler, R., Gross, M.: Shape from selfies: human body shape estimation using CCA regression forests. In: European Conference on Computer Vision, pp. 88–104 (2016)Google Scholar
  33. 33.
    Dou, M., et al.: Fusion4D: real-time performance capture of challenging scenes. ACM Trans. Graph. 35(4), 114 (2016)CrossRefGoogle Scholar
  34. 34.
    Gall, J., Stoll, C., De Aguiar, E., Theobalt, C., Rosenhahn, B., Seidel, H.P.: Motion capture using joint skeleton tracking and surface estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1753. IEEE (2009)Google Scholar
  35. 35.
    Gilbert, A., Volino, M., Collomosse, J., Hilton, A.: Volumetric performance capture from minimal camera viewpoints. In: European Conference on Computer Vision (2018)Google Scholar
  36. 36.
    Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (2018)Google Scholar
  37. 37.
    Guan, P., Weiss, A., Bălan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: IEEE International Conference on Computer Vision, pp. 1381–1388. IEEE (2009)Google Scholar
  38. 38.
    Guler, R.A., Kokkinos, I.: Holopose: holistic 3D human reconstruction in-the-wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10884–10894. IEEE (2019)Google Scholar
  39. 39.
    Guo, Y., Chen, X., Zhou, B., Zhao, Q.: Clothed and naked human shapes estimation from a single image. In: Hu, S.-M., Martin, R.R. (eds.) CVM 2012. LNCS, vol. 7633, pp. 43–50. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-34263-9_6CrossRefGoogle Scholar
  40. 40.
    Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: LiveCap: real-time human performance capture from monocular video. ACM Trans. Graph. 38(2), 14:1–14:17 (2019)CrossRefGoogle Scholar
  41. 41.
    Hasler, N., Ackermann, H., Rosenhahn, B., Thormahlen, T., Seidel, H.P.: Multilinear pose and body shape estismation of dressed subjects from image sets. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1823–1830. IEEE (2010)Google Scholar
  42. 42.
    Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. Comput. Graph. Forum 28(2), 337–346 (2009)CrossRefGoogle Scholar
  43. 43.
    Henderson, P., Ferrari, V.: Learning to generate and reconstruct 3D meshes with only 2D supervision. In: British Machine Vision Conference (2018)Google Scholar
  44. 44.
    Hesse, N., Pujades, S., Black, M.J., Arens, M., Hofmann, U., Schroeder, S.: Learning and tracking the 3D body shape of freely moving infants from RGB-D sequences. Trans. Pattern Anal. Mach. Intell. (TPAMI) (2019).  https://doi.org/10.1109/TPAMI.2019.2917908. 12 Pages
  45. 45.
    Hilton, A., Beresford, D.J., Gentils, T., Smith, R.S., Sun, W.: Virtual people: capturing human models to populate virtual worlds. Proc. Comput. Anim. 99, 174 (1999)Google Scholar
  46. 46.
    Hirshberg, D.A., Loper, M., Rachlin, E., Black, M.J.: Coregistration: simultaneous alignment and modeling of articulated 3D shape. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 242–255. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_18CrossRefGoogle Scholar
  47. 47.
    Huang, C.H., Allain, B., Franco, J.S., Navab, N., Ilic, S., Boyer, E.: Volumetric 3D tracking by detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3862–3870. IEEE (2016)Google Scholar
  48. 48.
    Huang, Y., et al.: Towards accurate markerless human shape and pose estimation over time. In: International Conference on 3D Vision. IEEE (2017)Google Scholar
  49. 49.
    Huang, Y., Kaufmann, M., Aksan, E., Black, M.J., Hilliges, O., Pons-Moll, G.: Deep inertial poser learning to reconstruct human pose from sparseinertial measurements in real time. ACM Trans. Graph. 37(6), 185:1–185:15 (2018)CrossRefGoogle Scholar
  50. 50.
    Huang, Z., et al.: Deep volumetric video from very sparse multi-view performance capture. In: European Conference on Computer Vision, pp. 336–354 (2018)Google Scholar
  51. 51.
    Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: VolumeDeform: real-time volumetric non-rigid reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 362–379. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_22CrossRefGoogle Scholar
  52. 52.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_3CrossRefGoogle Scholar
  53. 53.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134. IEEE (2017)Google Scholar
  54. 54.
    Jackson, A.S., Manafas, C., Tzimiropoulos, G.: 3D human body reconstruction from a single image via volumetric regression. In: European Conference on Computer Vision, pp. 64–77 (2018)Google Scholar
  55. 55.
    Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8320–8329. IEEE (2018)Google Scholar
  56. 56.
    Kakadiaris, I.A., Metaxas, D.: 3D human body model acquisition from multiple views. In: IEEE International Conference on Computer Vision. IEEE (1995)Google Scholar
  57. 57.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018)Google Scholar
  58. 58.
    Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5614–5623. IEEE (2019)Google Scholar
  59. 59.
    Kim, M., et al.: Data-driven physics for human soft tissue animation. ACM Trans. Graph. 36(4), 1–12 (2017)CrossRefGoogle Scholar
  60. 60.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, vol. 5 (2015)Google Scholar
  61. 61.
    Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)Google Scholar
  62. 62.
    Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)Google Scholar
  63. 63.
    Leroy, V., Franco, J.S., Boyer, E.: Multi-view dynamic shape refinement using local temporal integration. In: IEEE International Conference on Computer Vision. IEEE (2017)Google Scholar
  64. 64.
    Li, H., Vouga, E., Gudym, A., Luo, L., Barron, J.T., Gusev, G.: 3D self-portraits. ACM Trans. Graph. 32(6), 187 (2013)Google Scholar
  65. 65.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2016)Google Scholar
  66. 66.
    Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: learning dynamic renderable volumes from images. arXiv preprint arXiv:1906.07751 (2019)
  67. 67.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015)CrossRefGoogle Scholar
  68. 68.
    von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: European Conference on Computer Vision (2018)Google Scholar
  69. 69.
    von Marcard, T., Pons-Moll, G., Rosenhahn, B.: Human pose estimation from video and IMUs. Trans. Pattern Anal. Mach. Intell. (PAMI) 38, 1533–1547 (2016)CrossRefGoogle Scholar
  70. 70.
    von Marcard, T., Rosenhahn, B., Black, M.J., Pons-Moll, G.: Sparse inertial poser: automatic 3D human pose estimation from sparse IMUs. In: Computer Graphics Forum, pp. 349–360 (2017)Google Scholar
  71. 71.
    Matusik, W., Buehler, C., Raskar, R., Gortler, S.J., McMillan, L.: Image-based visual hulls. In: Annual Conference on Computer Graphics and Interactive Techniques, pp. 369–374 (2000)Google Scholar
  72. 72.
    Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)Google Scholar
  73. 73.
    Michalkiewicz, M., Pontes, J.K., Jack, D., Baktashmotlagh, M., Eriksson, A.: Deep level sets: implicit surface representations for 3D shape inference. arXiv preprint arXiv:1901.06802 (2019)
  74. 74.
    Natsume, R., et al.: SiCloPe: silhouette-based clothed people. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)Google Scholar
  75. 75.
    Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–352. IEEE (2015)Google Scholar
  76. 76.
    Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: International Conference on 3D Vision. IEEE (2018)Google Scholar
  77. 77.
    Orts-Escolano, S., et al.: Holoportation: virtual 3D teleportation in real-time. In: Symposium on User Interface Software and Technology, pp. 741–754 (2016)Google Scholar
  78. 78.
    Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)Google Scholar
  79. 79.
    Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)Google Scholar
  80. 80.
    Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018)Google Scholar
  81. 81.
    Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2016)Google Scholar
  82. 82.
    Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.P., Rosenhahn, B.: Multisensor-fusion for 3D full-body human motion capture. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2010)Google Scholar
  83. 83.
    Pons-Moll, G., Fleet, D.J., Rosenhahn, B.: Posebits for monocular human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2345–2352. IEEE (2014)Google Scholar
  84. 84.
    Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: ClothCap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. 36(4), 1–15 (2017)CrossRefGoogle Scholar
  85. 85.
    Pons-Moll, G., Romero, J., Mahmood, N., Black, M.J.: Dyna: a model of dynamic human shape in motion. ACM Trans. Graph. 34, 120 (2015)CrossRefGoogle Scholar
  86. 86.
    Pons-Moll, G., Rosenhahn, B.: Model-based pose estimation. In: Moeslund, T., Hilton, A., Krüger, V., Sigal, L. (eds.) Visual Analysis of Humans, pp. 139–170. Springer, London (2011).  https://doi.org/10.1007/978-0-85729-997-0_9CrossRefGoogle Scholar
  87. 87.
    Pumarola, A., Sanchez, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3DPeople: modeling the geometry of dressed humans. arXiv preprint arXiv:1904.04571 (2019)
  88. 88.
    Rhodin, H., Robertini, N., Casas, D., Richardt, C., Seidel, H.-P., Theobalt, C.: General automatic human shape and motion capture using volumetric contour cues. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 509–526. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_31CrossRefGoogle Scholar
  89. 89.
    Robertini, N., Casas, D., Rhodin, H., Seidel, H.P., Theobalt, C.: Model-based outdoor performance capture. In: International Conference on 3D Vision. IEEE (2016)Google Scholar
  90. 90.
    Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36(6), 245 (2017)CrossRefGoogle Scholar
  91. 91.
    Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: IEEE International Conference on Computer Vision. IEEE (2019)Google Scholar
  92. 92.
    Shapiro, A., et al.: Rapid avatar capture and simulation using commodity depth sensors. Comput. Anim. Virtual Worlds 25(3–4), 201–211 (2014)CrossRefGoogle Scholar
  93. 93.
    Shysheya, A., et al.: Textured neural avatars. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2387–2397. IEEE (2019)Google Scholar
  94. 94.
    Sigal, L., Balan, A., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: Advances in Neural Information Processing Systems, pp. 1337–1344 (2007)Google Scholar
  95. 95.
    Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhöfer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)Google Scholar
  96. 96.
    Slavcheva, M., Baust, M., Cremers, D., Ilic, S.: KillingFusion: non-rigid 3D reconstruction without correspondences. In: IEEE Conference on Computer Vision and Pattern Recognition, p. 7, no. 4. IEEE (2017)Google Scholar
  97. 97.
    Sminchisescu, C., Telea, A.: Human pose estimation from silhouettes. A consistent approach using distance level sets. In: 10th International Conference on Computer Graphics, Visualization and Computer Vision (WSCG 2002) (2002)Google Scholar
  98. 98.
    Sminchisescu, C., Triggs, B.: Kinematic jump processes for monocular 3D human tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, p. I. IEEE (2003)Google Scholar
  99. 99.
    Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27(3), 21–31 (2007)CrossRefGoogle Scholar
  100. 100.
    Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of gaussians body model. In: IEEE International Conference on Computer Vision, pp. 951–958. IEEE (2011)Google Scholar
  101. 101.
    Tao, Y., et al.: DoubleFusion: real-time capture of human performance with inner body shape from a depth sensor. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018)Google Scholar
  102. 102.
    Tao, Y., et al.: SimulCap: single-view human performance capture with cloth simulation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)Google Scholar
  103. 103.
    Taylor, C.J.: Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 677–684. IEEE (2000)Google Scholar
  104. 104.
    Theobalt, C., Aguiar, E., Magnor, M.A., Seidel, H.P.: Reconstructing human shape, motion and appearance from multi-view video. In: Ozaktas, H.M., Onural, L. (eds.) Three-Dimensional Television. Signals and Communication Technology, pp. 29–57. Springer, Berlin (2008).  https://doi.org/10.1007/978-3-540-72532-9_3CrossRefGoogle Scholar
  105. 105.
    Theobalt, C., Carranza, J., Magnor, M.A.: Enhancing silhouette-based human motion capture with 3D motion fields. In: Proceedings of the 11th Pacific Conference on Computer Graphics and Applications, pp. 185–193 (2003)Google Scholar
  106. 106.
    Tung, H.Y., Tung, H.W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: Advances in Neural Information Processing Systems, pp. 5236–5246 (2017)Google Scholar
  107. 107.
    Varol, G., et al.: BodyNet: volumetric inference of 3D human body shapes. In: European Conference on Computer Vision (2018)Google Scholar
  108. 108.
    Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27(3), 97 (2008)CrossRefGoogle Scholar
  109. 109.
    Wang, W., Qiangeng, X., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. arXiv preprint arXiv:1905.10711 (2019)
  110. 110.
    Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402 (2003)Google Scholar
  111. 111.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2016)Google Scholar
  112. 112.
    Weiss, A., Hirshberg, D., Black, M.J.: Home 3D body scans from noisy image and range data. In: IEEE International Conference on Computer Vision, pp. 1951–1958. IEEE (2011)Google Scholar
  113. 113.
    Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: posing face, body, and hands in the wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10965–10974. IEEE (2019)Google Scholar
  114. 114.
    Xu, W., et al.: MonoPerfCap: human performance capture from monocular video. ACM Trans. Graph. 37, 1–15 (2018)Google Scholar
  115. 115.
    Yao, P., Fang, Z., Wu, F., Feng, Y., Li, J.: DenseBody: directly regressing dense 3d human pose and shape from a single color image. arXiv preprint arXiv:1903.10153 (2019)
  116. 116.
    Zeng, M., Zheng, J., Cheng, X., Liu, X.: Templateless quasi-rigid shape modeling with implicit loop-closure. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 145–152. IEEE (2013)Google Scholar
  117. 117.
    Zhang, C., Pujades, S., Black, M.J., Pons-Moll, G.: Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)Google Scholar
  118. 118.
    Zhang, Q., Fu, B., Ye, M., Yang, R.: Quality dynamic human body modeling using a single low-cost depth camera. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 676–683. IEEE (2014)Google Scholar
  119. 119.
    Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. arXiv preprint arXiv:1903.06473 (2019)
  120. 120.
    Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4491–4500. IEEE (2019)Google Scholar
  121. 121.
    Zuffi, S., Kanazawa, A., Jacobs, D., Black, M.J.: 3D menagerie: modeling the 3D shape and pose of animals. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5524–5532. IEEE (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.TU BraunschweigBraunschweigGermany

Personalised recommendations