Abstract
This paper presents a novel method for 3D human pose and shape estimation from images with sparse views, using joint points and silhouettes, based on a parametric model. Firstly, the parametric model is fitted to the joint points estimated by deep learning-based human pose estimation. Then, we extract the correspondence between the parametric model of pose fitting and silhouettes in 2D and 3D space. A novel energy function based on the correspondence is built and minimized to fit a parametric model to the silhouettes. Our approach uses comprehensive shape information because the energy function of silhouettes is built from both 2D and 3D space. This also means that our method only needs images from sparse views, which balances data used and the required prior information. Results on synthetic data and real data demonstrate the competitive performance of our approach on pose and shape estimation of the human body.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alldieck, T., Magnor, M., Xu, W.P., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3d people models. In: CVPR, pp. 8387–8397 (2018)
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. ACM Trans. Graph. 24, 408–416 (2005)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_29
Bogo, F., Black, M.J., Loper, M., Romero, J.: Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In: ICCV, pp. 2300–2308 (2015)
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3d human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Dou, M.S., et al.: Fusion4d: real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 114:1–114:13 (2016)
Geman, S., McClure, D.: Statistical methods for tomographic image reconstruction. Bull. Int. Stat. Inst. 52, 5–21 (1987)
Guan, P., Weiss, A., Bãlan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: ICCV, pp. 1381–1388 (2009)
Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 2017 International Conference on 3D Vision (3DV), pp. 421–430 (2017)
Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: VolumeDeform: real-time volumetric non-rigid reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 362–379. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_22
Izadi, S., et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST), pp. 559–568 (2011)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR, pp. 7122–7131 (2018)
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: ICCV, pp. 2252–2261 (2019)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. Int. J. Comput. Vision, 1–15 (2019)
Li, Z., Heyden, A., Oskarsson, M.: Parametric model-based 3d human shape and pose estimation from multiple views. In: 21st Scandinavian Conference on Image Analysis (SCIA), pp. 336–347 (2019)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. 34, 248:1–248:16 (2015)
Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_11
Newcombe, R.A., Fox, D., Seitz, S.M.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: CVPR, pp. 343–352 (2015)
Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A., Tzionas, D., Black, M.J.: Expressive body capture: 3d hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)
Pavlakos, G., Zhu, L.Y., Zhou, X.W., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: CVPR, pp. 459–468 (2018)
Pavlakos, G., Kolotouros, N., Daniilidis, K.: Texturepose: Supervising human mesh estimation with texture consistency. In: ICCV pp. 803–812 (2019)
Sigal, L., Balan, A., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: NIPS, pp. 1337–1344 (2008)
Slavcheva, M., Baust, M., Cremers, D., Ilic, S.: Killingfusion: on-rigid 3d reconstruction without correspondences. In: CVPR, pp. 5474–5483 (2017)
Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., Schmid, C.: BodyNet: volumetric inference of 3D human body shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 20–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_2
Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27, 97:1–97:9 (2008)
Weiss, A., Hirshberg, D., Black, M.J.: Home 3D body scans from noisy image and range data. In: ICCV, pp. 1951–1958 (2011)
Xu, L., Su, Z., Han, L., Yu, T., Liu, Y., FANG, L.: Unstructuredfusion: Realtime 4d geometry and texture reconstruction using commercial rgbd cameras. IEEE Trans. Pattern Anal. Mach. Intell., 1 (2019)
Xu, W.P., et al.: Monoperfcap: human performance capture from monocular video. ACM Trans. Graph. 37, 27:1–27:15 (2016)
Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., Theobalt, C.: Performance capture of interacting characters with handheld kinects. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 828–841. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_59
Yu, T., et al.: Bodyfusion: real-time capture of human motion and surface geometry using a single depth camera. In: ICCV, pp. 910–919 (2017)
Yu, T., et al.: Doublefusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: CVPR, pp. 7287–7296 (2018)
Acknowledgements
We would like to appreciate the support from ELLIIT, eSSENCE and the China Scholarship Council (CSC) for our research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Z., Heyden, A., Oskarsson, M. (2021). A Novel Joint Points and Silhouette-Based Method to Estimate 3D Human Pose and Shape. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12661. Springer, Cham. https://doi.org/10.1007/978-3-030-68763-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-68763-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68762-5
Online ISBN: 978-3-030-68763-2
eBook Packages: Computer ScienceComputer Science (R0)