Skip to main content

CHORE: Contact, Human and Object Reconstruction from a Single RGB Image

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13662))

Included in the following conference series:

Abstract

Most prior works in perceiving 3D humans from images reason human in isolation without their surroundings. However, humans are constantly interacting with the surrounding objects, thus calling for models that can reason about not only the human but also the object and their interaction. The problem is extremely challenging due to heavy occlusions between humans and objects, diverse interaction types and depth ambiguity. In this paper, we introduce CHORE, a novel method that learns to jointly reconstruct the human and the object from a single RGB image. CHORE takes inspiration from recent advances in implicit surface learning and classical model-based fitting. We compute a neural reconstruction of human and object represented implicitly with two unsigned distance fields, a correspondence field to a parametric body and an object pose field. This allows us to robustly fit a parametric body model and a 3D object template, while reasoning about interactions. Furthermore, prior pixel-aligned implicit learning methods use synthetic data and make assumptions that are not met in the real data. We propose a elegant depth-aware scaling that allows more efficient shape learning on real data. Experiments show that our joint reconstruction learned with the proposed strategy significantly outperforms the SOTA. Our code and models are available at https://virtualhumans.mpi-inf.mpg.de/chore

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. https://www.mturk.com

  2. http://virtualhumans.mpi-inf.mpg.de/people.html

  3. Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1175–1186 (2019)

    Google Scholar 

  4. Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Detailed human avatars from monocular video. In: International Conference on 3D Vision, pp. 98–109, September 2018

    Google Scholar 

  5. Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8387–8397 (2018)

    Google Scholar 

  6. Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2shape: detailed full human body geometry from a single image. In: IEEE International Conference on Computer Vision (ICCV), pp. 2293–2303. IEEE, October 2019

    Google Scholar 

  7. Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Combining implicit function learning and parametric models for 3D human reconstruction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 311–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_19

    Chapter  Google Scholar 

  8. Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Loopreg: self-supervised learning of implicit surface correspondences, pose and shape for 3D human mesh registration. In: Advances in Neural Information Processing Systems (NeurIPS), December 2020

    Google Scholar 

  9. Bhatnagar, B.L., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-garment net: learning to dress 3D people from images. In: IEEE International Conference on Computer Vision (ICCV), pp. 5420–5430. IEEE, Ovtober 2019

    Google Scholar 

  10. Bhatnagar, B.L., Xie, X., Petrov, I., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Behave: dataset and method for tracking human object interactions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15935–15946, June 2022

    Google Scholar 

  11. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34

    Chapter  Google Scholar 

  12. Brahmbhatt, S., Ham, C., Kemp, C.C., Hays, J.: ContactDB: analyzing and predicting grasp contact via thermal imaging. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8709–8719 (2019). https://contactdb.cc.gatech.edu

  13. Brahmbhatt, S., Handa, A., Hays, J., Fox, D.: Contactgrasp: functional multi-finger grasp synthesis from contact. In: IROS, pp. 2386–2393 (04 2019)

    Google Scholar 

  14. Brahmbhatt, S., Tang, C., Twigg, C.D., Kemp, C.C., Hays, J.: ContactPose: a dataset of grasps with object contact and hand pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 361–378. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_22

    Chapter  Google Scholar 

  15. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. Tech. Rep. arXiv:1512.03012 [cs.GR], Stanford University – Princeton University – Toyota Technological Institute at Chicago (2015)

  16. Chen, Y., Huang, S., Yuan, T., Qi, S., Zhu, Y., Zhu, S.C.: Holistic++ scene understanding: single-view 3D holistic scene parsing and human pose estimation with human-object interaction and physical commonsense. In: The IEEE International Conference on Computer Vision (ICCV), pp. 8648–8657 (2019)

    Google Scholar 

  17. Chibane, J., Mir, A., Pons-Moll, G.: Neural unsigned distance fields for implicit function learning. In: Neural Information Processing Systems (NeurIPS), December 2020

    Google Scholar 

  18. Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38

    Chapter  Google Scholar 

  19. Corona, E., Pons-Moll, G., Alenya, G., Moreno-Noguer, F.: Learned vertex descent: a new direction for 3D human model fitting. In: European Conference on Computer Vision (ECCV). Springer (October 2022)

    Google Scholar 

  20. Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F., Rogez, G.: Ganhand: predicting human grasp affordances in multi-object scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5031–5041, June 2020

    Google Scholar 

  21. Ehsani, K., Tulsiani, S., Gupta, S., Farhadi, A., Gupta, A.: Use the force, luke! learning to predict physical forces by simulating effects. In: CVPR, pp. 224–233 (2020)

    Google Scholar 

  22. Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 605–613, July 2017

    Google Scholar 

  23. Fieraru, M., Zanfir, M., Oneata, E., Popa, A., Olaru, V., Sminchisescu, C.: Learning complex 3D human self-contact. CoRR abs/2012.10366 (2020). https://arxiv.org/abs/2012.10366

  24. Fieraru, M., Zanfir, M., Oneata, E., Popa, A.I., Olaru, V., Sminchisescu, C.: Three-dimensional reconstruction of human interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7214–7223, June 2020

    Google Scholar 

  25. Fu, K., Peng, J., He, Q., Zhang, H.: Single image 3D object reconstruction based on deep learning: a review. Multimedia Tools Appl. 80(1), 463–498 (2020). https://doi.org/10.1007/s11042-020-09722-8

    Article  Google Scholar 

  26. Guler, R.A., Kokkinos, I.: Holopose: holistic 3D human reconstruction in-the-wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10884–10894, June 2019

    Google Scholar 

  27. Guo, C., Chen, X., Song, J., Hilliges, O.: Human performance capture from monocular video in the wild. In: 2021 International Conference on 3D Vision (3DV), pp. 889–898. IEEE (2021)

    Google Scholar 

  28. Habermann, M., Xu, W., Zollhoefer, M., Pons-Moll, G., Theobalt, C.: Deepcap: monocular human performance capture using weak supervision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5052–5063. IEEE, June 2020

    Google Scholar 

  29. Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: Livecap: real-time human performance capture from monocular video. ACM Trans. Graph. 38(2), 14:1–14:17, March 2019. https://doi.org/10.1145/3311970

  30. Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constraints. In: International Conference on Computer Vision, pp. 2282–2292 (2019)

    Google Scholar 

  31. Hassan, M., Ghosh, P., Tesch, J., Tzionas, D., Black, M.J.: Populating 3D scenes by learning human-scene interaction. In: Proceedings IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14708–14718, June 2021

    Google Scholar 

  32. Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: CVPR, pp. 11807–11816 (2019)

    Google Scholar 

  33. Huang, C.H.P., et al.: Capturing and inferring dense full-body human-scene contact. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13274–13285, June 2022

    Google Scholar 

  34. Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: Arch: animatable reconstruction of clothed humans. ArXiv abs/2004.04572 (2020)

    Google Scholar 

  35. Häne, C., Tulsiani, S., Malik, J.: Hierarchical surface prediction for 3D object reconstruction. In: 2017 International Conference on 3D Vision (3DV), pp. 412–420 (2017). https://doi.org/10.1109/3DV.2017.00054

  36. Jiang, B., Zhang, J., Hong, Y., Luo, J., Liu, L., Bao, H.: Bcnet: learning body and cloth shape from a single image. In: European Conference on Computer Vision, pp. 18–35. Springer (2020). https://doi.org/10.1007/978-3-030-58565-5_2

  37. Jiang, J., et al.: Avatarposer: articulated full-body pose tracking from sparse motion sensing. In: European Conference on Computer Vision (2022)

    Google Scholar 

  38. Jiang, W., Kolotouros, N., Pavlakos, G., Zhou, X., Daniilidis, K.: Coherent reconstruction of multiple humans from a single image. In: CVPR, pp. 5579–5588 (2020)

    Google Scholar 

  39. Jiang, Y., et al.: Neuralfusion: neural volumetric rendering under human-object interactions. arXiv preprint arXiv:2202.12825 (2022)

  40. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131. IEEE Computer Society (2018)

    Google Scholar 

  41. Kar, A., Tulsiani, S., Malik, J.: Category-specific object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1966–1974 (2015)

    Google Scholar 

  42. Karunratanakul, K., Yang, J., Zhang, Y., Black, M., Muandet, K., Tang, S.: Grasping field: learning implicit representations for human grasps. In: 8th International Conference on 3D Vision, pp. 333–344. IEEE, November 2020. https://doi.org/10.1109/3DV50981.2020.00043

  43. Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5252–5262. IEEE, June 2020. https://doi.org/10.1109/CVPR42600.2020.00530

  44. Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: ICCV, pp. 2252–2261 (2019)

    Google Scholar 

  45. Lei, J., Sridhar, S., Guerrero, P., Sung, M., Mitra, N., Guibas, L.J.: Pix2Surf: learning parametric 3D surface models of objects from images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 121–138. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_8

    Chapter  Google Scholar 

  46. Li, Z., Sedlar, J., Carpentier, J., Laptev, I., Mansard, N., Sivic, J.: Estimating 3D motion and forces of person-object interactions from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8640–8649, June 2019

    Google Scholar 

  47. Lin, T.-Y.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  48. Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L.Y., Kot, A.C.: Ntu rgb+d 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2684–2701 (2019). https://doi.org/10.1109/TPAMI.2019.2916873

    Article  Google Scholar 

  49. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A skinned multi-person linear model. In: ACM Trans. Graph. 34(6), 1–16. ACM (2015)

    Google Scholar 

  50. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4460–4470 (2019)

    Google Scholar 

  51. Monszpart, A., Guerrero, P., Ceylan, D., Yumer, E., J. Mitra, N.: iMapper: interaction-guided scene mapping from monocular videos. In: ACM SIGGRAPH (2019)

    Google Scholar 

  52. Müller, L., Osman, A.A.A., Tang, S., Huang, C.H.P., Black, M.J.: On self-contact and human pose. In: Proceedings IEEE/CVF Confernce on Computer Vision and Pattern Recognition (CVPR), 9990–9999 (2021)

    Google Scholar 

  53. Mller, N., Wong, Y.S., Mitra, N.J., Dai, A., Niessner, M.: Seeing behind objects for 3D multi-object tracking in RGB-D sequences. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR), pp. 6071–6080. IEEE (2021)

    Google Scholar 

  54. Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: International Conference on 3D Vision, pp. 484–494 (2018)

    Google Scholar 

  55. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10975–10985 (2019)

    Google Scholar 

  56. Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)

    Google Scholar 

  57. Pons-Moll, G., Rosenhahn, B.: Model-based pose estimation, chap. 9, pp. 139–170. Springer (2011). https://doi.org/10.1007/978-0-85729-997-0_9

  58. Pontes, J.K., Kong, C., Sridharan, S., Lucey, S., Eriksson, A., Fookes, C.: Image2mesh: a learning framework for single image 3D reconstruction. In: ACCV, pp. 365–381. Springer International Publishing (2019)

    Google Scholar 

  59. Rempe, D., Guibas, L.J., Hertzmann, A., Russell, B., Villegas, R., Yang, J.: Contact and human dynamics from monocular video. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 71–87. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_5

    Chapter  Google Scholar 

  60. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: Modeling and capturing hands and bodies together. ACM Trans. Graphics, (Proc. SIGGRAPH Asia) 36(6) (2017)

    Google Scholar 

  61. Rong, Y., Shiratori, T., Joo, H.: Frankmocap: a monocular 3D whole-body pose estimation system via regression and integration. In: IEEE International Conference on Computer Vision Workshops (2021)

    Google Scholar 

  62. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2019)

    Google Scholar 

  63. Saito, S., Simon, T., Saragih, J., Joo, H.: Pifuhd: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  64. Savva, M., Chang, A.X., Hanrahan, P., Fisher, M., Nießner, M.: PiGraphs: Learning Interaction Snapshots from Observations. ACM Trans. Graphics (TOG) 35(4) (2016)

    Google Scholar 

  65. Sun, G., et al.: Neural free-viewpoint performance rendering under complex human-object interactions. In: Proceedings of the 29th ACM International Conference on Multimedia (2021)

    Google Scholar 

  66. Sun, X., et al.: Pix3d: dataset and methods for single-image 3D shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  67. Taheri, O., Ghorbani, N., Black, M.J., Tzionas, D.: GRAB: a dataset of whole-body human grasping of objects. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 581–600. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_34

    Chapter  Google Scholar 

  68. Tiwari, G., Antic, D., Lenssen, J.E., Sarafianos, N., Tung, T., Pons-Moll, G.: Pose-ndf: Modeling human pose manifolds with neural distance fields. In: European Conference on Computer Vision (ECCV). Springer, October 2022

    Google Scholar 

  69. Tiwari, G., Bhatnagar, B.L., Tung, T., Pons-Moll, G.: Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3D clothing. In: European Conference on Computer Vision (ECCV), pp. 1–18. Springer (August 2020). https://doi.org/10.1007/978-3-030-58580-8_1

  70. Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.-G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_4

    Chapter  Google Scholar 

  71. Weng, Z., Yeung, S.: Holistic 3D human and scene mesh estimation from single view images. arXiv preprint arXiv:2012.01591 (2020)

  72. Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, W.T., Tenenbaum, J.B.: Marrnet: 3D shape reconstruction via 2.5D sketches. In: Advances In Neural Information Processing Systems (2017)

    Google Scholar 

  73. Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W.T., Tenenbaum, J.B.: Learning 3D shape priors for shape completion and reconstruction. In: European Conference on Computer Vision (ECCV), pp. 646–662 (2018)

    Google Scholar 

  74. Xiang, Y., et al.: Objectnet3D: a large scale database for 3D object recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision - ECCV 2016, pp. 160–176. Springer International Publishing (2016). https://doi.org/10.1007/978-3-319-46484-8_10

  75. Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13296–13306, Jun 2022

    Google Scholar 

  76. Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: Disn: deep implicit surface network for high-quality single-view 3D reconstruction. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/39059724f73a9969845dfe4146c5660e-Paper.pdf

  77. Yang, L., Zhan, X., Li, K., Xu, W., Li, J., Lu, C.: CPF: learning a contact potential field to model the hand-object interaction. In: ICCV, pp. 11097–11106 (2021)

    Google Scholar 

  78. Yi, H., et al.: Human-aware object placement for visual environment reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3959–3970, June 2022

    Google Scholar 

  79. Zhang, J.Y., Pepose, S., Joo, H., Ramanan, D., Malik, J., Kanazawa, A.: Perceiving 3D human-object spatial arrangements from a single image in the wild. In: European Conference on Computer Vision (ECCV), pp. 34–51 (2020). https://doi.org/10.1007/978-3-030-58610-2_3

  80. Zhang, S., Liu, J., Liu, Y., Ling, N.: Dimnet: dense implicit function network for 3D human body reconstruction. Comput. Graph. 98, 1–10 (2021). https://doi.org/10.1016/j.cag.2021.04.035

    Article  Google Scholar 

  81. Zhang, S., Zhang, Y., Ma, Q., Black, M.J., Tang, S.: Place: proximity learning of articulation and contact in 3D environments. In: International Conference on 3D Vision (3DV), pp. 642–651, November 2020

    Google Scholar 

  82. Zhang, X., Bhatnagar, B.L., Guzov, V., Starke, S., Pons-Moll, G.: Couch: towards controllable human-chair interactions. arXiv preprint arXiv:2205.00541 (May 2022)

  83. Zhang, X., Zhang, Z., Zhang, C., Tenenbaum, J.B., Freeman, W.T., Wu, J.: Learning to reconstruct shapes from unseen classes. In: Advances in Neural Information Processing Systems (NeurIPS) (2018)

    Google Scholar 

  84. Zhao, F., Wang, W., Liao, S., Shao, L.: Learning anchored unsigned distance functions with gradient direction alignment for single-view garment reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12674–12683 (2021)

    Google Scholar 

  85. Cao, Z., Gao, H., Mangalam, K., Cai, Q.-Z., Vo, M., Malik, J.: Long-term human motion prediction with scene context. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 387–404. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_23

    Chapter  Google Scholar 

  86. Zhou, K., Bhatnagar, B.L., Lenssen, J.E., Pons-Moll, G.: Toch: spatio-temporal object correspondence to hand for motion refinement. In: European Conference on Computer Vision (ECCV). Springer, October 2022

    Google Scholar 

Download references

Acknowledgements

We would like to thank RVH group members [2] for their helpful discussions. Special thanks to Beiyang Li for supplementary preparation. This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans), and German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A. Gerard Pons-Moll is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 - Project number 390727645.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianghui Xie .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18822 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, X., Bhatnagar, B.L., Pons-Moll, G. (2022). CHORE: Contact, Human and Object Reconstruction from a Single RGB Image. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13662. Springer, Cham. https://doi.org/10.1007/978-3-031-20086-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20086-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20085-4

  • Online ISBN: 978-3-031-20086-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics