Joint Optimization for Multi-person Shape Models from Markerless 3D-Scans

Zeitvogel, Samuel; Dornheim, Johannes; Laubenheimer, Astrid

doi:10.1007/978-3-030-58523-5_3

Samuel Zeitvogel¹²,
Johannes Dornheim¹² &
Astrid Laubenheimer¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12363))

Included in the following conference series:

European Conference on Computer Vision

3249 Accesses

Abstract

We propose a markerless end-to-end training framework for parametric 3D human shape models. The training of statistical 3D human shape models with minimal supervision is an important problem in computer vision. Contrary to prior work, the whole training process (i) uses a differentiable shape model surface and (ii) is trained end-to-end by jointly optimizing all parameters of a single, self-contained objective that can be solved with slightly modified off-the-shelf non-linear least squares solvers. The training process only requires a compact model definition and an off-the-shelf 2D RGB pose estimator. No pre-trained shape models are required. For training (iii) a medium-sized dataset of approximately 1000 low-resolution human body scans is sufficient to achieve competitive performance on the challenging FAUST surface correspondence benchmark. The training and evaluation code will be made available for research purposes to facilitate end-to-end shape model training on novel datasets with minimal setup cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/Intelligent-Systems-Research-Group/JOMS/.

References

Agarwal, S., Mierle, K., et al.: Ceres solver. http://ceres-solver.org
Alexa, M., Wardetzky, M.: Discrete laplacians on general polygonal meshes. In: ACM Transactions on Graphics (TOG), vol. 30, p. 102. ACM (2011)
Google Scholar
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. In: ACM Transactions on Graphics (TOG), vol. 24, pp. 408–416. ACM (2005)
Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), vol. 99, pp. 187–194 (1999)
Google Scholar
Bogo, F., Romero, J., Loper, M., Black, M.J.: Faust: dataset and evaluation for 3D mesh registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3794–3801 (2014)
Google Scholar
Bogo, F., Romero, J., Pons-Moll, G., Black, M.J.: Dynamic faust: registering human bodies in motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6233–6242 (2017)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Cashman, T.J., Fitzgibbon, A.W.: What shape are dolphins? Building 3D morphable models from 2D images. Trans. Pattern Anal. Mach. Intell. (PAMI) 35(1), 232–244 (2012)
Article Google Scholar
Catmull, E., Clark, J.: Recursively generated b-spline surfaces on arbitrary topological meshes. Comput. Aided Des. 10(6), 350–355 (1978)
Article Google Scholar
Deprelle, T., Groueix, T., Fisher, M., Kim, V., Russell, B., Aubry, M.: Learning elementary structures for 3D shape generation and matching. In: Advances in Neural Information Processing Systems (NIPS), pp. 7433–7443 (2019)
Google Scholar
DeVito, Z., et al.: Opt: a domain specific language for non-linear least squares optimization in graphics and imaging. In: ACM Transactions on Graphics 2017 (TOG) (2017)
Google Scholar
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. Trans. Pattern Anal. Mach. Intell. (PAMI) 40(3), 611–625 (2018)
Article Google Scholar
Farin, G.E., Farin, G.: Curves and Surfaces for CAGD: A Practical Guide. Morgan Kaufmann (2002)
Google Scholar
Genova, K., Cole, F., Maschinot, A., Sarna, A., Vlasic, D., Freeman, W.T.: Unsupervised training for 3D morphable model regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8377–8386 (2018)
Google Scholar
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: 3D-coded: 3D correspondences by deep deformation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 230–246 (2018)
Google Scholar
Halimi, O., Litany, O., Rodola, E., Bronstein, A.M., Kimmel, R.: Unsupervised learning of dense shape correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4370–4379 (2019)
Google Scholar
Hesse, N., et al.: Learning an infant body model from RGB-D data for accurate full body motion analysis. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 792–800. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_89
Chapter Google Scholar
Hirshberg, D.A., Loper, M., Rachlin, E., Black, M.J.: Coregistration: simultaneous alignment and modeling of articulated 3D shape. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 242–255. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_18
Chapter Google Scholar
Jaimez, M., Cashman, T.J., Fitzgibbon, A., Gonzalez-Jimenez, J., Cremers, D.: An efficient background term for 3D reconstruction and tracking with smooth surface models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8320–8329 (2018)
Google Scholar
Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images, June 2015
Google Scholar
Lewis, J.P., Cordner, M., Fong, N.: Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: Proceedings of the Conference on Computer Graphics and Interactive Techniques, pp. 165–172 (2000)
Google Scholar
Li, C.L., Simon, T., Saragih, J., Póczos, B., Sheikh, Y.: LBS autoencoder: self-supervised fitting of articulated meshes to point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11967–11976 (2019)
Google Scholar
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. (TOG) 36(6), 194 (2017)
Google Scholar
Loop, C., Schaefer, S.: Approximating Catmull-Clark subdivision surfaces with bicubic patches. ACM Trans. Graph. (TOG) 27(1), 8 (2008)
Article Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 248 (2015)
Article Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Pishchulin, L., Wuhrer, S., Helten, T., Theobalt, C., Schiele, B.: Building statistical shape spaces for 3D human modeling. Pattern Recogn. 67, 276–286 (2017)
Article Google Scholar
Robinette, K.M., Daanen, H., Paquet, E.: The CAESAR project: a 3-D surface anthropometry survey. In: International Conference on 3-D Digital Imaging and Modeling, pp. 380–386. IEEE (1999)
Google Scholar
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (TOG) 36(6), 245 (2017)
Article Google Scholar
Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. In: Symposium on Geometry Processing, vol. 4 (2007)
Google Scholar
Taylor, J., et al.: User-specific hand modeling from monocular depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 644–651 (2014)
Google Scholar
Taylor, J., et al.: Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM Trans. Graph. (TOG) 35(4), 143 (2016)
Article Google Scholar
Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Zach, C.: Robust bundle adjustment revisited. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 772–787. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_50
Chapter Google Scholar
Zach, C., Bourmaud, G.: Iterated lifting for robust cost optimization. In: Proceedings of the British Machine Vision Conference (BMVC) (2017)
Google Scholar
Zeitvogel, S., Laubenheimer, A.: Towards end-to-end 3D human avatar shape reconstruction from 4D data. In: International Symposium on Electronics and Telecommunications (ISETC), pp. 1–4. IEEE (2018)
Google Scholar
Zuffi, S., Black, M.J.: The stitched puppet: a graphical model of 3D human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3537–3546 (2015)
Google Scholar

Download references

Acknowledgment

We thank A. Bender for the data setup figure. We thank J. Wetzel and N. Link for technical discussion. This work was supported by the German Federal Ministry of Education and Research (BMBF) under Grant 13FH025IX6.

Author information

Authors and Affiliations

Intelligent Systems Research Group (ISRG), Karlsruhe University of Applied Sciences, Karlsruhe, Germany
Samuel Zeitvogel, Johannes Dornheim & Astrid Laubenheimer

Authors

Samuel Zeitvogel
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Dornheim
View author publications
You can also search for this author in PubMed Google Scholar
Astrid Laubenheimer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Samuel Zeitvogel , Johannes Dornheim or Astrid Laubenheimer .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 1352 KB)

Supplementary material 3 (mp4 23276 KB)

Supplementary material 4 (mp4 277 KB)

Supplementary material 5 (mp4 567 KB)

Supplementary material 6 (mp4 475 KB)

Supplementary material 7 (mp4 988 KB)

Supplementary material 8 (mp4 4841 KB)

Supplementary material 1 (pdf 19185 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeitvogel, S., Dornheim, J., Laubenheimer, A. (2020). Joint Optimization for Multi-person Shape Models from Markerless 3D-Scans. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12363. Springer, Cham. https://doi.org/10.1007/978-3-030-58523-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-58523-5_3
Published: 04 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58522-8
Online ISBN: 978-3-030-58523-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics