A Novel Joint Points and Silhouette-Based Method to Estimate 3D Human Pose and Shape

Li, Zhongguo; Heyden, Anders; Oskarsson, Magnus

doi:10.1007/978-3-030-68763-2_4

Zhongguo Li¹⁶,
Anders Heyden¹⁶ &
Magnus Oskarsson¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12661))

Included in the following conference series:

International Conference on Pattern Recognition

2379 Accesses

Abstract

This paper presents a novel method for 3D human pose and shape estimation from images with sparse views, using joint points and silhouettes, based on a parametric model. Firstly, the parametric model is fitted to the joint points estimated by deep learning-based human pose estimation. Then, we extract the correspondence between the parametric model of pose fitting and silhouettes in 2D and 3D space. A novel energy function based on the correspondence is built and minimized to fit a parametric model to the silhouettes. Our approach uses comprehensive shape information because the energy function of silhouettes is built from both 2D and 3D space. This also means that our method only needs images from sparse views, which balances data used and the required prior information. Results on synthetic data and real data demonstrate the competitive performance of our approach on pose and shape estimation of the human body.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alldieck, T., Magnor, M., Xu, W.P., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3d people models. In: CVPR, pp. 8387–8397 (2018)
Google Scholar
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. ACM Trans. Graph. 24, 408–416 (2005)
Article Google Scholar
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_29
Chapter Google Scholar
Bogo, F., Black, M.J., Loper, M., Romero, J.: Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In: ICCV, pp. 2300–2308 (2015)
Google Scholar
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3d human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar
Dou, M.S., et al.: Fusion4d: real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 114:1–114:13 (2016)
Google Scholar
Geman, S., McClure, D.: Statistical methods for tomographic image reconstruction. Bull. Int. Stat. Inst. 52, 5–21 (1987)
MathSciNet Google Scholar
Guan, P., Weiss, A., Bãlan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: ICCV, pp. 1381–1388 (2009)
Google Scholar
Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 2017 International Conference on 3D Vision (3DV), pp. 421–430 (2017)
Google Scholar
Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: VolumeDeform: real-time volumetric non-rigid reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 362–379. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_22
Chapter Google Scholar
Izadi, S., et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST), pp. 559–568 (2011)
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR, pp. 7122–7131 (2018)
Google Scholar
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: ICCV, pp. 2252–2261 (2019)
Google Scholar
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. Int. J. Comput. Vision, 1–15 (2019)
Google Scholar
Li, Z., Heyden, A., Oskarsson, M.: Parametric model-based 3d human shape and pose estimation from multiple views. In: 21st Scandinavian Conference on Image Analysis (SCIA), pp. 336–347 (2019)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. 34, 248:1–248:16 (2015)
Google Scholar
Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_11
Chapter Google Scholar
Newcombe, R.A., Fox, D., Seitz, S.M.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: CVPR, pp. 343–352 (2015)
Google Scholar
Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A., Tzionas, D., Black, M.J.: Expressive body capture: 3d hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)
Google Scholar
Pavlakos, G., Zhu, L.Y., Zhou, X.W., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: CVPR, pp. 459–468 (2018)
Google Scholar
Pavlakos, G., Kolotouros, N., Daniilidis, K.: Texturepose: Supervising human mesh estimation with texture consistency. In: ICCV pp. 803–812 (2019)
Google Scholar
Sigal, L., Balan, A., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: NIPS, pp. 1337–1344 (2008)
Google Scholar
Slavcheva, M., Baust, M., Cremers, D., Ilic, S.: Killingfusion: on-rigid 3d reconstruction without correspondences. In: CVPR, pp. 5474–5483 (2017)
Google Scholar
Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., Schmid, C.: BodyNet: volumetric inference of 3D human body shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 20–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_2
Chapter Google Scholar
Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27, 97:1–97:9 (2008)
Google Scholar
Weiss, A., Hirshberg, D., Black, M.J.: Home 3D body scans from noisy image and range data. In: ICCV, pp. 1951–1958 (2011)
Google Scholar
Xu, L., Su, Z., Han, L., Yu, T., Liu, Y., FANG, L.: Unstructuredfusion: Realtime 4d geometry and texture reconstruction using commercial rgbd cameras. IEEE Trans. Pattern Anal. Mach. Intell., 1 (2019)
Google Scholar
Xu, W.P., et al.: Monoperfcap: human performance capture from monocular video. ACM Trans. Graph. 37, 27:1–27:15 (2016)
Google Scholar
Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., Theobalt, C.: Performance capture of interacting characters with handheld kinects. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 828–841. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_59
Chapter Google Scholar
Yu, T., et al.: Bodyfusion: real-time capture of human motion and surface geometry using a single depth camera. In: ICCV, pp. 910–919 (2017)
Google Scholar
Yu, T., et al.: Doublefusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: CVPR, pp. 7287–7296 (2018)
Google Scholar

Download references

Acknowledgements

We would like to appreciate the support from ELLIIT, eSSENCE and the China Scholarship Council (CSC) for our research.

Author information

Authors and Affiliations

Lund University, Lund, Sweden
Zhongguo Li, Anders Heyden & Magnus Oskarsson

Authors

Zhongguo Li
View author publications
You can also search for this author in PubMed Google Scholar
Anders Heyden
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Oskarsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongguo Li .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Z., Heyden, A., Oskarsson, M. (2021). A Novel Joint Points and Silhouette-Based Method to Estimate 3D Human Pose and Shape. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12661. Springer, Cham. https://doi.org/10.1007/978-3-030-68763-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-68763-2_4
Published: 21 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68762-5
Online ISBN: 978-3-030-68763-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)