Advertisement

3D Human Shape Reconstruction from a Polarization Image

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12359)

Abstract

This paper tackles the problem of estimating 3D body shape of clothed humans from single polarized 2D images, i.e. polarization images. Polarization images are known to be able to capture polarized reflected lights that preserve rich geometric cues of an object, which has motivated its recent applications in reconstructing surface normal of the objects of interest. Inspired by the recent advances in human shape estimation from single color images, in this paper, we attempt at estimating human body shapes by leveraging the geometric cues from single polarization images. A dedicated two-stage deep learning approach, SfP, is proposed: given a polarization image, stage one aims at inferring the fined-detailed body surface normal; stage two gears to reconstruct the 3D body shape of clothing details. Empirical evaluations on a synthetic dataset (SURREAL) as well as a real-world dataset (PHSPD) demonstrate the qualitative and quantitative performance of our approach in estimating human poses and shapes. This indicates polarization camera is a promising alternative to the more conventional color or depth imaging for human shape estimation. Further, normal maps inferred from polarization imaging play a significant role in accurately recovering the body shapes of clothed people.

Keywords

Human pose and shape estimation Clothed 3D human body Shape from polarization 

Notes

Acknowledgement

This work is supported by the NSERC Discovery Grants, and the University of Alberta-Huawei Joint Innovation Collaboration grants.

Supplementary material

504468_1_En_21_MOESM1_ESM.zip (32.9 mb)
Supplementary material 1 (zip 33654 KB)

References

  1. 1.
    Park, S., Hwang, J., Kwak, N.: 3D human pose estimation using convolutional neural networks with 2D pose information. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 156–169. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49409-8_15CrossRefGoogle Scholar
  2. 2.
    Li, S., Zhang, W., Chan, A.B.: Maximum-margin structured learning with deep networks for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2848–2856 (2015)Google Scholar
  3. 3.
    Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3D human pose with deep neural networks. In: British Machine Vision Conference (BMVC) (2016)Google Scholar
  4. 4.
    Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2500–2509 (2017)Google Scholar
  5. 5.
    Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)Google Scholar
  6. 6.
    Zhao, R., Wang, Y., Martinez, A.M.: A simple, fast and highly-accurate algorithm to recover 3D shape from 2D landmarks on a single image. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3059–3066 (2017)CrossRefGoogle Scholar
  7. 7.
    Moreno-Noguer, F.: 3D human pose estimation from a single image via distance matrix regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2823–2832 (2017)Google Scholar
  8. 8.
    Nie, B.X., Wei, P., Zhu, S.C.: Monocular 3D human pose estimation by predicting depth on joints. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3467–3475. IEEE (2017)Google Scholar
  9. 9.
    Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)Google Scholar
  10. 10.
    Wang, M., Chen, X., Liu, W., Qian, C., Lin, L., Ma, L.: DRPose3D: depth ranking in 3D human pose estimation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 978–984 (2018)Google Scholar
  11. 11.
    Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D human pose estimation in the wild by adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5255–5264 (2018)Google Scholar
  12. 12.
    Fang, H.S., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning pose grammar to encode human body configuration for 3D pose estimation. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 6821–6828 (2018)Google Scholar
  13. 13.
    Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7307–7316 (2018)Google Scholar
  14. 14.
    Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_33CrossRefGoogle Scholar
  15. 15.
    Liu, J., et al.: Feature boosting network for 3D pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 494–501 (2020)CrossRefGoogle Scholar
  16. 16.
    Sharma, S., Varigonda, P.T., Bindal, P., Sharma, A., Jain, A., Bangalore, S.B.: Monocular 3D human pose estimation by generation and ordinal ranking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2325–2334 (2019)Google Scholar
  17. 17.
    Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., Theobalt, C.: In the wild human pose estimation using explicit 2D features and intermediate 3D representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10905–10914 (2019)Google Scholar
  18. 18.
    Wandt, B., Rosenhahn, B.: RepNet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7782–7791 (2019)Google Scholar
  19. 19.
    Li, C., Lee, G.H.: Generating multiple hypotheses for 3D human pose estimation with mixture density network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9887–9895 (2019)Google Scholar
  20. 20.
    Wang, K., Lin, L., Jiang, C., Qian, C., Wei, P.: 3D human pose machines with self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1069–1082 (2019)Google Scholar
  21. 21.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. ACM Trans. Graph. (TOG) 24, 408–416 (2005)CrossRefGoogle Scholar
  22. 22.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 248 (2015)CrossRefGoogle Scholar
  23. 23.
    Balan, A.O., Sigal, L., Black, M.J., Davis, J.E., Haussecker, H.W.: Detailed human shape and pose from images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar
  24. 24.
    Dibra, E., Jain, H., Oztireli, C., Ziegler, R., Gross, M.: Human shape from silhouettes using generative HKS descriptors and cross-modal neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4826–4836 (2017)Google Scholar
  25. 25.
    Dibra, E., Jain, H., Öztireli, C., Ziegler, R., Gross, M.: HS-Nets: estimating human body shape from silhouettes with convolutional neural networks. In: Fourth International Conference on 3D Vision (3DV), pp. 108–117. IEEE (2016)Google Scholar
  26. 26.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_34CrossRefGoogle Scholar
  27. 27.
    Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6050–6059 (2017)Google Scholar
  28. 28.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)Google Scholar
  29. 29.
    Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)Google Scholar
  30. 30.
    Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)Google Scholar
  31. 31.
    Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: International Conference on 3D Vision (3DV), pp. 484–494. IEEE (2018)Google Scholar
  32. 32.
    Xu, Y., Zhu, S.C., Tung, T.: DenseRaC: joint 3D pose and shape estimation by dense render-and-compare. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7760–7770 (2019)Google Scholar
  33. 33.
    Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2252–2261 (2019)Google Scholar
  34. 34.
    Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3D pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2148–2157 (2018)Google Scholar
  35. 35.
    Sun, Y., Ye, Y., Liu, W., Gao, W., Fu, Y., Mei, T.: Human mesh recovery from monocular images via a skeleton-disentangled representation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5349–5358 (2019)Google Scholar
  36. 36.
    Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5614–5623 (2019)Google Scholar
  37. 37.
    Arnab, A., Doersch, C., Zisserman, A.: Exploiting temporal context for 3D human pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3395–3404 (2019)Google Scholar
  38. 38.
    Varol, G., et al.: BodyNet: volumetric inference of 3D human body shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 20–38. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_2CrossRefGoogle Scholar
  39. 39.
    Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7739–7749 (2019)Google Scholar
  40. 40.
    Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2304–2314 (2019)Google Scholar
  41. 41.
    Tang, S., Tan, F., Cheng, K., Li, Z., Zhu, S., Tan, P.: A neural network for detailed human depth estimation from a single image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7750–7759 (2019)Google Scholar
  42. 42.
    Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4491–4500 (2019)Google Scholar
  43. 43.
    Yang, L., Tan, F., Li, A., Cui, Z., Furukawa, Y., Tan, P.: Polarimetric dense monocular SLAM. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3857–3866 (2018)Google Scholar
  44. 44.
    Ba, Y., Chen, R., Wang, Y., Yan, L., Shi, B., Kadambi, A.: Physics-based neural networks for shape from polarization. arXiv preprint arXiv:1903.10210 (2019)
  45. 45.
    Wehner, R., Müller, M.: The significance of direct sunlight and polarized skylight in the ant’s celestial system of navigation. Proc. Natl. Acad. Sci. 103(33), 12575–12579 (2006)CrossRefGoogle Scholar
  46. 46.
    Daly, I.M., et al.: Dynamic polarization vision in mantis shrimps. Nat. Commun. 7, 12140 (2016)CrossRefGoogle Scholar
  47. 47.
    Atkinson, G.A., Hancock, E.R.: Recovery of surface orientation from diffuse polarization. IEEE Trans. Image Process. 15(6), 1653–1664 (2006)CrossRefGoogle Scholar
  48. 48.
    Kadambi, A., Taamazyan, V., Shi, B., Raskar, R.: Depth sensing using geometrically constrained polarization normals. Int. J. Comput. Vis. 125(1–3), 34–51 (2017).  https://doi.org/10.1007/s11263-017-1025-7MathSciNetCrossRefGoogle Scholar
  49. 49.
    Chen, L., Zheng, Y., Subpa-asa, A., Sato, I.: Polarimetric three-view geometry. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 21–37. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01270-0_2CrossRefGoogle Scholar
  50. 50.
    Cui, Z., Gu, J., Shi, B., Tan, P., Kautz, J.: Polarimetric multi-view stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1558–1567 (2017)Google Scholar
  51. 51.
    Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3D human pose estimation from monocular video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4966–4975 (2016)Google Scholar
  52. 52.
    Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1446–1455 (2015)Google Scholar
  53. 53.
    Wang, C., Wang, Y., Lin, Z., Yuille, A.L., Gao, W.: Robust estimation of 3D human poses from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2361–2368 (2014)Google Scholar
  54. 54.
    Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33765-9_41CrossRefGoogle Scholar
  55. 55.
    Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: MonoCap: monocular human motion capture using a CNN coupled with a geometric prior. IEEE Trans. Pattern Anal. Machine Intell. 41(4), 901–914 (2019)CrossRefGoogle Scholar
  56. 56.
    Zhou, X., Zhu, M., Leonardos, S., Daniilidis, K.: Sparse representation for 3D shape estimation: a convex relaxation approach. IEEE Trans. Pattern Anal. Mach. Intell. 39(8), 1648–1661 (2016)CrossRefGoogle Scholar
  57. 57.
    Chen, C.H., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7035–7043 (2017)Google Scholar
  58. 58.
    Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2262–2271 (2019)Google Scholar
  59. 59.
    Cai, Y., et al.: Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2272–2281 (2019)Google Scholar
  60. 60.
    Jiang, H., Cai, J., Zheng, J.: Skeleton-aware 3D human shape reconstruction from point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5431–5441 (2019)Google Scholar
  61. 61.
    Nehab, D., Rusinkiewicz, S., Davis, J., Ramamoorthi, R.: Efficiently combining positions and normals for precise 3D geometry. ACM Trans. Graph. (TOG) 24(3), 536–543 (2005)CrossRefGoogle Scholar
  62. 62.
    Zou, S., et al.: Polarization human shape and pose dataset. arXiv preprint arXiv:2004.14899 (2020)
  63. 63.
    Cao, Z., Martinez, G.H., Simon, T., Wei, S., Sheikh, Y.A.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. (2019)Google Scholar
  64. 64.
    Zuo, X., et al.: SparseFusion: dynamic human avatar modeling from sparse RGBD images. IEEE Trans. Multimed. (2020)Google Scholar
  65. 65.
    Smith, W.A.P., Ramamoorthi, R., Tozza, S.: Linear depth estimation from an uncalibrated, monocular polarisation image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 109–125. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_7CrossRefGoogle Scholar
  66. 66.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of AlbertaEdmontonCanada
  2. 2.Simon Fraser UniversityBurnabyCanada
  3. 3.School of AutomationChina University of GeosciencesWuhanChina
  4. 4.University of GuelphGuelphCanada

Personalised recommendations