Skip to main content

EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13666))

Included in the following conference series:

Abstract

Understanding social interactions from egocentric views is crucial for many applications, ranging from assistive robotics to AR/VR. Key to reasoning about interactions is to understand the body pose and motion of the interaction partner from the egocentric view. However, research in this area is severely hindered by the lack of datasets. Existing datasets are limited in terms of either size, capture/annotation modalities, ground-truth quality, or interaction diversity. We fill this gap by proposing EgoBody, a novel large-scale dataset for human pose, shape and motion estimation from egocentric views, during interactions in complex 3D scenes. We employ Microsoft HoloLens2 headsets to record rich egocentric data streams (including RGB, depth, eye gaze, head and hand tracking). To obtain accurate 3D ground truth, we calibrate the headset with a multi-Kinect rig and fit expressive SMPL-X body meshes to multi-view RGB-D frames, reconstructing 3D human shapes and poses relative to the scene, over time. We collect 125 sequences, spanning diverse interaction scenarios, and propose the first benchmark for 3D full-body pose and shape estimation of the interaction partner from egocentric views. We extensively evaluate state-of-the-art methods, highlight their limitations in the egocentric scenario, and address such limitations leveraging our high-quality annotations. Data and code are available at https://sanweiliti.github.io/egobody/egobody.html.

F. Bogo—Now at Meta Reality Labs Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The results on 3DPW are taken from the respective original papers.

References

  1. Azure Kinect. https://docs.microsoft.com/en-us/azure/kinect-dk/

  2. LAAN Labs 3D Scanner app. https://apps.apple.com/us/app/3d-scanner-app/id1419913995

  3. Microsoft Hololens2. https://www.microsoft.com/en-us/hololens

  4. SMPL model transfer. https://github.com/vchoutas/smplx/tree/master/transfer_mode

  5. Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2005)

    Article  Google Scholar 

  6. Aghaei, M., Dimiccoli, M., Ferrer, C.C., Radeva, P.: Towards social pattern characterization in egocentric photo-streams. Comput. Vis. Image Underst. 171, 104–117 (2018)

    Article  Google Scholar 

  7. Aghaei, M., Dimiccoli, M., Radeva, P.: With whom do i interact? Detecting social interactions in egocentric photo-streams. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2959–2964. IEEE (2016)

    Google Scholar 

  8. A. Nisbet, R.: The Social Bond: An Introduction to the Study of Society (1970)

    Google Scholar 

  9. Bălan, A.O., Black, M.J.: The naked truth: estimating body shape under clothing. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 15–29. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_2

    Chapter  Google Scholar 

  10. Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1949–1957 (2015)

    Google Scholar 

  11. Besl, P., McKay, N.D.: A method for registration of 3-d shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)

    Article  Google Scholar 

  12. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3d human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34

    Chapter  Google Scholar 

  13. Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: Openpose: Realtime multi-person 2d pose estimation using part affinity fields. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

    Google Scholar 

  14. Choi, H., Moon, G., Chang, J.Y., Lee, K.M.: Beyond static features for temporally consistent 3d human pose and shape from a video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1964–1973 (2021)

    Google Scholar 

  15. Choi, H., Moon, G., Lee, K.M.: Pose2Mesh: graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 769–787. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_45

    Chapter  Google Scholar 

  16. Doughty, D., et al.: Scaling egocentric vision: the dataset. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 753–771. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_44

    Chapter  Google Scholar 

  17. Damen, D., et al.: Rescaling egocentric vision: collection, pipeline and challenges for epic-kitchens-100. Int. J. Comput. Vision 130(1), 33–55 (2022)

    Article  MathSciNet  Google Scholar 

  18. Dhand, A., Dalton, A.E., Luke, D.A., Gage, B.F., Lee, J.M.: Accuracy of wearable cameras to track social interactions in stroke survivors. J. Stroke Cerebrovasc. Dis. 25(12), 2907–2910 (2016)

    Article  Google Scholar 

  19. Dong, J., Shuai, Q., Zhang, Y., Liu, X., Zhou, X., Bao, H.: Motion capture from internet videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 210–227. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_13

    Chapter  Google Scholar 

  20. Fang, Q., Shuai, Q., Dong, J., Bao, H., Zhou, X.: Reconstructing 3d human pose by watching humans in the mirror. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12814–12823 (2021)

    Google Scholar 

  21. Fathi, A., Hodgins, J.K., Rehg, J.M.: Social interactions: a first-person perspective. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1226–1233. IEEE (2012)

    Google Scholar 

  22. Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: 2011 International Conference on Computer Visio, pp. 407–414. IEEE (2011)

    Google Scholar 

  23. Fieraru, M., Zanfir, M., Oneata, E., Popa, A.I., Olaru, V., Sminchisescu, C.: Three-dimensional reconstruction of human interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7214–7223 (2020)

    Google Scholar 

  24. Gall, J., Rosenhahn, B., Brox, T., Seidel, H.P.: Optimization and filtering for human motion capture. Int. J. Comput. Vision 87(1–2), 75 (2010)

    Article  Google Scholar 

  25. Gower, J.C.: Generalized Procrustes analysis. Psychometrika 40(1), 33–51 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  26. Grauman, K., Shakhnarovich, G., Darrell, T.: Inferring 3d structure with a statistical image-based shape model. In: ICCV, vol. 3, p. 641 (2003)

    Google Scholar 

  27. Grauman, K., et al.: Ego4D: Around the world in 3000 hours of egocentric video. arXiv preprint arXiv:2110.07058 (2021)

  28. Guler, R.A., Kokkinos, I.: Holopose: Holistic 3d human reconstruction in-the-wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10884–10894 (2019)

    Google Scholar 

  29. Guzov, V., Mir, A., Sattler, T., Pons-Moll, G.: Human positioning system (HPS): 3d human pose estimation and self-localization in large scenes from body-mounted sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4318–4329 (2021)

    Google Scholar 

  30. Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3d human pose ambiguities with 3d scene constraints. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2282–2292 (2019)

    Google Scholar 

  31. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  32. Huang, Y., Bogo, F., Lassner, C., Kanazawa, A., Gehler, P.V., Romero, J., Akhter, I., Black, M.J.: Towards accurate marker-less human shape and pose estimation over time. In: 2017 International Conference on 3D Vision (3DV), pp. 421–430. IEEE (2017)

    Google Scholar 

  33. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)

    Google Scholar 

  34. Jiang, H., Grauman, K.: Seeing invisible poses: estimating 3d body pose from egocentric video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3501–3509. IEEE (2017)

    Google Scholar 

  35. Joo, H., Neverova, N., Vedaldi, A.: Exemplar fine-tuning for 3d human pose fitting towards in-the-wild 3d human pose estimation (2021)

    Google Scholar 

  36. Joo, H., Simon, T., Cikara, M., Sheikh, Y.: Towards social artificial intelligence: nonverbal social signal prediction in a triadic interaction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10873–10883 (2019)

    Google Scholar 

  37. Joo, H., et al.: Panoptic studio: a massively multiview system for social interaction capture. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 190–204 (2017)

    Article  Google Scholar 

  38. Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3d deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8320–8329 (2018)

    Google Scholar 

  39. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)

    Google Scholar 

  40. Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5614–5623 (2019)

    Google Scholar 

  41. Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)

  42. Kazakos, E., Nagrani, A., Zisserman, A., Damen, D.: Epic-fusion: audio-visual temporal binding for egocentric action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5492–5501 (2019)

    Google Scholar 

  43. Kitani, K.M., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports videos. In: CVPR 2011, pp. 3241–3248. IEEE (2011)

    Google Scholar 

  44. Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5253–5263 (2020)

    Google Scholar 

  45. Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: PARE: part attention regressor for 3D human body estimation. In: Proceedings International Conference on Computer Vision (ICCV), pp. 11127–11137. IEEE, October 2021

    Google Scholar 

  46. Kocabas, M., Huang, C.H.P., Tesch, J., Müller, L., Hilliges, O., Black, M.J.: SPEC: Seeing people in the wild with an estimated camera. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 11035–11045, October 2021

    Google Scholar 

  47. Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2252–2261 (2019)

    Google Scholar 

  48. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR (2019)

    Google Scholar 

  49. Kolotouros, N., Pavlakos, G., Jayaraman, D., Daniilidis, K.: Probabilistic modeling for human mesh recovery. In: ICCV (2021)

    Google Scholar 

  50. Kwon, T., Tekin, B., Stuhmer, J., Bogo, F., Pollefeys, M.: H2O: two hands manipulating objects for first person interaction recognition. In: International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  51. Lab, C.G.: CMU Graphics Lab Motion Capture Database (2000). https://mocap.cs.cmu.edu/

  52. Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1346–1353. IEEE (2012)

    Google Scholar 

  53. Li, H., Cai, Y., Zheng, W.S.: Deep dual relation modeling for egocentric interaction recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7932–7941 (2019)

    Google Scholar 

  54. Li, J., Xu, C., Chen, Z., Bian, S., Yang, L., Lu, C.: Hybrik: a hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3383–3393 (2021)

    Google Scholar 

  55. Li, Y., Liu, M., Rehg, J.M.: In the eye of beholder: joint learning of gaze and actions in first person video. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 639–655. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_38

    Chapter  Google Scholar 

  56. Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: CVPR (2021)

    Google Scholar 

  57. Liu, M., Yang, D., Zhang, Y., Cui, Z., Rehg, J.M., Tang, S.: 4D human body capture from egocentric video via 3D scene grounding. In: 2021 International Conference on 3D Vision (3DV) (2021)

    Google Scholar 

  58. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 1–16 (2015)

    Article  Google Scholar 

  59. Luo, Z., Golestaneh, S.A., Kitani, K.M.: 3d human motion estimation via motion compression and refinement. In: Proceedings of the Asian Conference on Computer Vision (2020)

    Google Scholar 

  60. Luo, Z., Hachiuma, R., Yuan, Y., Iwase, S., Kitani, K.M.: Kinematics-guided reinforcement learning for object-aware 3d ego-pose estimation. arXiv preprint arXiv:2011.04837 (2020)

  61. Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: Amass: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5442–5451 (2019)

    Google Scholar 

  62. von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using IMUs and a moving camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 614–631. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_37

    Chapter  Google Scholar 

  63. von Marcard, T., Pons-Moll, G., Rosenhahn, B.: Human pose estimation from video and IMUs. Trans. Pattern Anal. Mach. Intell. 38(8), 1533–1547 (2016)

    Article  Google Scholar 

  64. Mehta, D., et al.: Monocular 3d human pose estimation in the wild using improved CNN supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)

    Google Scholar 

  65. Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single RGB image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 752–768. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_44

    Chapter  Google Scholar 

  66. Narayan, S., Kankanhalli, M.S., Ramakrishnan, K.R.: Action and interaction recognition in first-person videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–518 (2014)

    Google Scholar 

  67. Ng, E., Xiang, D., Joo, H., Grauman, K.: You2me: Inferring body pose in egocentric video via first and second person interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9890–9900 (2020)

    Google Scholar 

  68. Northcutt, C., Zha, S., Lovegrove, S., Newcombe, R.: EgoCom: a multi-person multi-modal egocentric communications dataset. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)

    Google Scholar 

  69. Ogaki, K., Kitani, K.M., Sugano, Y., Sato, Y.: Coupling eye-motion and ego-motion features for first-person activity recognition. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–7. IEEE (2012)

    Google Scholar 

  70. Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: 2018 international conference on 3D vision (3DV), pp. 484–494. IEEE (2018)

    Google Scholar 

  71. Patel, P., Huang, C.H.P., Tesch, J., Hoffmann, D.T., Tripathi, S., Black, M.J.: AGORA: avatars in geography optimized for regression analysis. In: Proceedings IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021

    Google Scholar 

  72. Pavlakos, G., et al.: Expressive body capture: 3d hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)

    Google Scholar 

  73. Pech-Pacheco, J.L., Cristóbal, G., Chamorro-Martinez, J., Fernández-Valdivia, J.: Diatom autofocusing in bright field microscopy: a comparative study. In: Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, vol. 3, pp. 314–317. IEEE (2000)

    Google Scholar 

  74. Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2847–2854. IEEE (2012)

    Google Scholar 

  75. Rong, Y., Shiratori, T., Joo, H.: FrankMocap: a monocular 3d whole-body pose estimation system via regression and integration. In: IEEE International Conference on Computer Vision Workshops (2021)

    Google Scholar 

  76. Ryoo, M.S., Matthies, L.: First-person activity recognition: What are they doing to me? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2730–2737 (2013)

    Google Scholar 

  77. Saini, N., et al.: MarkerLess outdoor human motion capture using multiple autonomous micro aerial vehicles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 823–832 (2019)

    Google Scholar 

  78. Shiratori, T., Park, H.S., Sigal, L., Sheikh, Y., Hodgins, J.K.: Motion capture from body-mounted cameras. In: ACM SIGGRAPH 2011 Papers, pp. 1–10 (2011)

    Google Scholar 

  79. Sigurdsson, G.A., Gupta, A., Schmid, C., Farhadi, A., Alahari, K.: Actor and observer: joint modeling of first and third-person videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7396–7404 (2018)

    Google Scholar 

  80. Song, J., Chen, X., Hilliges, O.: Human body model fitting by learned gradient descent. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 744–760. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_44

    Chapter  Google Scholar 

  81. Sun, Y., Ye, Y., Liu, W., Gao, W., Fu, Y., Mei, T.: Human mesh recovery from monocular images via a skeleton-disentangled representation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5349–5358 (2019)

    Google Scholar 

  82. Tan, J.K.V., Budvytis, I., Cipolla, R.: Indirect deep structured learning for 3d human body shape and pose prediction (2017)

    Google Scholar 

  83. Tome, D., et al.: SelfPose: 3d egocentric pose estimation from a headset mounted camera. arXiv preprint arXiv:2011.01519 (2020)

  84. Tome, D., Peluse, P., Agapito, L., Badino, H.: XR-EgoPose: EgoCentric 3d human pose from an HMD camera. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7728–7738 (2019)

    Google Scholar 

  85. Trumble, M., Gilbert, A., Malleson, C., Hilton, A., Collomosse, J.: Total capture: 3d human pose estimation fusing video and inertial sensors. In: 2017 British Machine Vision Conference (BMVC) (2017)

    Google Scholar 

  86. Tung, H.Y., Tung, H.W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: Advances in Neural Information Processing Systems, pp. 5236–5246 (2017)

    Google Scholar 

  87. Ungureanu, D., et al.: HoloLens 2 Research Mode as a Tool for Computer Vision Research. arXiv:2008.11239 (2020)

  88. Wandt, B., Rudolph, M., Zell, P., Rhodin, H., Rosenhahn, B.: CanonPose: self-supervised monocular 3d human pose estimation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13294–13304 (2021)

    Google Scholar 

  89. Wang, Y., Liu, Y., Tong, X., Dai, Q., Tan, P.: Outdoor markerless motion capture with sparse handheld video cameras. IEEE Trans. Visual Comput. Graph. 24(5), 1856–1866 (2017)

    Article  Google Scholar 

  90. Weng, Z., Yeung, S.: Holistic 3d human and scene mesh estimation from single view images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 334–343 (2021)

    Google Scholar 

  91. Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: posing face, body, and hands in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  92. Xu, W., et al.: Mo2Cap2: real-time mobile 3D motion capture with a cap-mounted fisheye camera. IEEE Trans. Visual Comput. Graph. 25(5), 2093–2101 (2019)

    Article  Google Scholar 

  93. Xu, Y., Zhu, S.C., Tung, T.: DenseRaC: joint 3d pose and shape estimation by dense render-and-compare. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7760–7770 (2019)

    Google Scholar 

  94. Yang, J.A., Lee, C.H., Yang, S.W., Somayazulu, V.S., Chen, Y.K., Chien, S.Y.: Wearable social camera: egocentric video summarization for social interaction. In: 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–6. IEEE (2016)

    Google Scholar 

  95. Yonetani, R., Kitani, K.M., Sato, Y.: Recognizing micro-actions and reactions from paired egocentric videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2629–2638 (2016)

    Google Scholar 

  96. Yu, Z., et al.: HUMBI: a large multiview dataset of human body expressions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2990–3000 (2020)

    Google Scholar 

  97. Yuan, Y., Kitani, K.: Ego-pose estimation and forecasting as real-time PD control. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10082–10092 (2019)

    Google Scholar 

  98. Yuan, Y., Wei, S.E., Simon, T., Kitani, K., Saragih, J.: SimPoe: simulated character control for 3d human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7159–7169 (2021)

    Google Scholar 

  99. Zanfir, A., Bazavan, E.G., Xu, H., Freeman, B., Sukthankar, R., Sminchisescu, C.: Weakly supervised 3d human pose and shape reconstruction with normalizing flows. arXiv preprint arXiv:2003.10350 (2020)

  100. Zhang, J., Yu, D., Liew, J.H., Nie, X., Feng, J.: Body meshes as points. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 546–556 (2021)

    Google Scholar 

  101. Zhang, S., Zhang, Y., Bogo, F., Marc, P., Tang, S.: Learning motion priors for 4d human body capture in 3d scenes. In: International Conference on Computer Vision (ICCV), October 2021

    Google Scholar 

  102. Zhang, Y., An, L., Yu, T., Li, X., Li, K., Liu, Y.: 4d association graph for realtime multi-person motion capture using multiple video cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1324–1333 (2020)

    Google Scholar 

  103. Zhang, Z., Crandall, D., Proulx, M., Talathi, S., Sharma, A.: Can gaze inform egocentric action recognition? In: 2022 Symposium on Eye Tracking Research and Applications, pp. 1–7 (2022)

    Google Scholar 

  104. Zhou, Y., Habermann, M., Habibie, I., Tewari, A., Theobalt, C., Xu, F.: Monocular real-time full body capture with inter-part correlations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4811–4822 (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the SNF grant 200021 204840 and Microsoft Mixed Reality & AI Zurich Lab PhD scholarship. Qianli Ma is partially funded by the Max Planck ETH Center for Learning Systems.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siwei Zhang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4378 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, S. et al. (2022). EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13666. Springer, Cham. https://doi.org/10.1007/978-3-031-20068-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20068-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20067-0

  • Online ISBN: 978-3-031-20068-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics