Advertisement

Reconstructing NBA Players

Conference paper
  • 751 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12350)

Abstract

Great progress has been made in 3D body pose and shape estimation from a single photo. Yet, state-of-the-art results still suffer from errors due to challenging body poses, modeling clothing, and self occlusions. The domain of basketball games is particularly challenging, as it exhibits all of these challenges. In this paper, we introduce a new approach for reconstruction of basketball players that outperforms the state-of-the-art. Key to our approach is a new method for creating poseable, skinned models of NBA players, and a large database of meshes (derived from the NBA2K19 video game) that we are releasing to the research community. Based on these models, we introduce a new method that takes as input a single photo of a clothed player in any basketball pose and outputs a high resolution mesh and 3D pose for that player. We demonstrate substantial improvement over state-of-the-art, single-image methods for body shape reconstruction. Code and dataset are available at http://grail.cs.washington.edu/projects/nba_players/.

Keyword

3D human reconstruction 

Notes

Acknowledgments

This work was supported by NSF/Intel Visual and Experimental Computing Award #1538618 and the UW Reality Lab funding from Facebook, Google and Futurewei. We thank Visual Concepts for allowing us to capture, process, and share NBA2K19 data for research.

Supplementary material

504441_1_En_11_MOESM1_ESM.pdf (7.8 mb)
Supplementary material 1 (pdf 8008 KB)

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
    Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  8. 8.
    Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  9. 9.
    Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2Shape: detailed full human body geometry from a single image. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2019)Google Scholar
  10. 10.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. ACM Trans. Graph. (TOG) 24, 408–416 (2005)Google Scholar
  11. 11.
    Bhatnagar, B.L., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-garment net: learning to dress 3D people from images. In: IEEE International Conference on Computer Vision (ICCV). IEEE, October 2019Google Scholar
  12. 12.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_34CrossRefGoogle Scholar
  13. 13.
    Bouritsas, G., Bokhnyak, S., Ploumpis, S., Bronstein, M., Zafeiriou, S.: Neural 3D morphable models: spiral convolutional networks for 3D shape representation learning and generation. In: The IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar
  14. 14.
    Calagari, K., Elgharib, M., Didyk, P., Kaspar, A., Matuisk, W., Hefeeda, M.: Gradient-based 2-D to 3-D conversion for soccer videos. In: ACM Multimedia, pp. 605–619 (2015)Google Scholar
  15. 15.
    Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. In: arXiv preprint arXiv:1812.08008 (2018)
  16. 16.
    Carr, P., Sheikh, Y., Matthews, I.: Pointless calibration: camera parameters from gradient-based alignment to edge images. In: WACV (2012)Google Scholar
  17. 17.
    Dionne, O., de Lasa, M.: Geodesic voxel binding for production character meshes. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 173–180. ACM (2013)Google Scholar
  18. 18.
    Germann, M., Hornung, A., Keiser, R., Ziegler, R., Würmlin, S., Gross, M.: Articulated billboards for video-based rendering. In: Computer Graphics Forum, vol. 29, pp. 585–594. Wiley Online Library (2010)Google Scholar
  19. 19.
    Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_29CrossRefGoogle Scholar
  20. 20.
    Grau, O., Hilton, A., Kilner, J., Miller, G., Sargeant, T., Starck, J.: A free-viewpoint video system for visualization of sport scenes. SMPTE Motion Imaging J. 116(5–6), 213–219 (2007)CrossRefGoogle Scholar
  21. 21.
    Grau, O., Thomas, G.A., Hilton, A., Kilner, J., Starck, J.: A robust free-viewpoint video system for sport scenes. In: 2007 3DTV Conference, pp. 1–4. IEEE (2007)Google Scholar
  22. 22.
    Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: 3D-CODED: 3D correspondences by deep deformation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 235–251. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01216-8_15CrossRefGoogle Scholar
  23. 23.
    Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: A papier-mâché approach to learning 3D surface generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 216–224 (2018)Google Scholar
  24. 24.
    Guillemaut, J.Y., Hilton, A.: Joint multi-layer segmentation and reconstruction for free-viewpoint video applications. IJCV 93, 73–100 (2011)Google Scholar
  25. 25.
    Guillemaut, J.Y., Kilner, J., Hilton, A.: Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes. In: ICCV (2009)Google Scholar
  26. 26.
    Guler, R.A., Kokkinos, I.: Holopose: holistic 3D human reconstruction in-the-wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  27. 27.
    Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: Livecap: real-time human performance capture from monocular video. ACM Trans. Graph. (Proc. SIGGRAPH) 38(2), 1–17 (2019)Google Scholar
  28. 28.
    Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., Theobalt, C.: In the wild human pose estimation using explicit 2D features and intermediate 3D representations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  29. 29.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  30. 30.
    Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 2017 International Conference on 3D Vision (3DV), pp. 421–430. IEEE (2017)Google Scholar
  31. 31.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)Google Scholar
  32. 32.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (2010).  https://doi.org/10.5244/C.24.12
  33. 33.
    Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8320–8329 (2018)Google Scholar
  34. 34.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  35. 35.
    Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  36. 36.
    Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE International Conference on Computer Vision (2019)Google Scholar
  37. 37.
    Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR (2019)Google Scholar
  38. 38.
    Krähenbühl, P.: Free supervision from video games. In: CVPR (2018)Google Scholar
  39. 39.
    Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6050–6059 (2017)Google Scholar
  40. 40.
    Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate o (n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155 (2009)CrossRefGoogle Scholar
  41. 41.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 248 (2015)CrossRefGoogle Scholar
  42. 42.
    von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 614–631. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_37CrossRefGoogle Scholar
  43. 43.
    Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)Google Scholar
  44. 44.
    Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)Google Scholar
  45. 45.
    Mehta, D., et al.: VNect: real-time 3d human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 44 (2017)CrossRefGoogle Scholar
  46. 46.
    Moon, G., Chang, J., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: The IEEE Conference on International Conference on Computer Vision (ICCV) (2019)Google Scholar
  47. 47.
    Natsume, R., et al.: Siclope: silhouette-based clothed people. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4480–4490 (2019)Google Scholar
  48. 48.
    Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  49. 49.
    Pavlakos, G., Kolotouros, N., Daniilidis, K.: Texturepose: supervising human mesh estimation with texture consistency. In: ICCV (2019)Google Scholar
  50. 50.
    Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)Google Scholar
  51. 51.
    Pons-Moll, G., Romero, J., Mahmood, N., Black, M.J.: Dyna: a model of dynamic human shape in motion. ACM Trans. Actions Graph. (Proc. SIGGRAPH) 34(4), 120:1–120:14 (2015)Google Scholar
  52. 52.
    Pumarola, A., Sanchez, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3DPeople: modeling the geometry of dressed humans. In: ICCV (2019)Google Scholar
  53. 53.
    Rematas, K., Kemelmacher-Shlizerman, I., Curless, B., Seitz, S.: Soccer on your tabletop. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4738–4747 (2018)Google Scholar
  54. 54.
    Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: ICCV (2017)Google Scholar
  55. 55.
    Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_7CrossRefGoogle Scholar
  56. 56.
    Robinette, K.M., Blackwell, S., Daanen, H., Boehmer, M., Fleming, S.: Civilian American and European Surface Anthropometry Resource (CAESAR), final report. vol. 1. summary. Technical report, SYTRONICS INC DAYTON OH (2002)Google Scholar
  57. 57.
    Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36(6) (2017)Google Scholar
  58. 58.
    Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. arXiv preprint arXiv:1905.05172 (2019)
  59. 59.
    Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)Google Scholar
  60. 60.
    Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. Symp. Geom. Process. 4, 109–116 (2007)Google Scholar
  61. 61.
    Sorkine, O., Cohen-Or, D., Lipman, Y., Alexa, M., Rössl, C., Seidel, H.P.: Laplacian surface editing. In: Proceedings of the 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, pp. 175–184 (2004)Google Scholar
  62. 62.
    Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_33CrossRefGoogle Scholar
  63. 63.
    Varol, G., et al.: BodyNet: volumetric inference of 3D human body shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 20–38. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_2CrossRefGoogle Scholar
  64. 64.
    Weng, C.Y., Curless, B., Kemelmacher-Shlizerman, I.: Photo wake-up: 3D character animation from a single photo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5908–5917 (2019)Google Scholar
  65. 65.
    Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: posing face, body, and hands in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  66. 66.
    Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_29CrossRefGoogle Scholar
  67. 67.
    Xu, F., et al.: Video-based characters: creating new human performances from a multi-view video database. ACM Trans. Graph. 30(4), 32:1–32:10 (2011).  https://doi.org/10.1145/2010324.1964927
  68. 68.
    Xu, W., et al.: Monoperfcap: human performance capture from monocular video. ACM Trans. Graph 37(2), 1–15 (2018)Google Scholar
  69. 69.
    Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3D pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2148–2157 (2018)Google Scholar
  70. 70.
    Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of WashingtonSeattleUSA

Personalised recommendations