Skip to main content

GRAB: A Dataset of Whole-Body Human Grasping of Objects

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12349))

Included in the following conference series:

Abstract

Training computers to understand, model, and synthesize human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time. While “grasping” is commonly thought of as a single hand stably lifting an object, we capture the motion of the entire body and adopt the generalized notion of “whole-body grasps”. Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size. Given MoCap markers, we fit the full 3D body shape and pose, including the articulated face and hands, as well as the 3D object pose. This gives detailed 3D meshes over time, from which we compute contact between the body and object. This is a unique dataset, that goes well beyond existing ones for modeling and understanding how humans grasp and manipulate objects, how their full body is involved, and how interaction varies with the task. We illustrate the practical value of GRAB with an example application; we train GrabNet, a conditional generative network, to predict 3D hand grasps for unseen 3D object shapes. The dataset and code are available for research purposes at https://grab.is.tue.mpg.de.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amazon Mechanical Turk. https://www.mturk.com

  2. Behbahani, F.M.P., Singla–Buxarrais, G., Faisal, A.A.: Haptic SLAM: an ideal observer model for bayesian inference of object shape and hand pose from contact dynamics. In: Bello, F., Kajimoto, H., Visell, Y. (eds.) EuroHaptics 2016. LNCS, vol. 9774, pp. 146–157. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42321-0_14

    Chapter  Google Scholar 

  3. Bernardin, K., Ogawara, K., Ikeuchi, K., Dillmann, R.: A sensor fusion approach for recognizing continuous human grasping sequences using hidden Markov models. IEEE Trans. Rob. (T-RO) 21(1), 47–57 (2005)

    Article  Google Scholar 

  4. Borras, J., Asfour, T.: A whole-body pose taxonomy for loco-manipulation tasks. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1578–1585 (2015)

    Google Scholar 

  5. Brahmbhatt, S., Ham, C., Kemp, C.C., Hays, J.: ContactDB: analyzing and predicting grasp contact via thermal imaging. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  6. Brubaker, M.A., Fleet, D.J., Hertzmann, A.: Physics-based person tracking using the anthropomorphic walker. Int. J. Comput. Vis. (IJCV) 87(1), 140 (2009)

    Google Scholar 

  7. Chang, A.X., et al.: ShapeNet: An information-rich 3D model repository. arXiv:1512.03012 (2015)

  8. Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M.J.: Monocular expressive body regression through body-driven attention. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

    Google Scholar 

  9. Corona, E., Pumarola, A., Alenyà, G., Moreno-Noguer, F.: Context-aware human motion prediction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  10. Cutkosky, M.R.: On grasp choice, grasp models, and the design of hands for manufacturing tasks. IEEE Trans. Rob. Autom. 5(3), 269–279 (1989)

    Article  MathSciNet  Google Scholar 

  11. Cyberglove III data glove. http://www.cyberglovesystems.com/cyberglove-iii

  12. ElKoura, G., Singh, K.: Handrix: animating the human hand. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2003)

    Google Scholar 

  13. Feit, A.M., Weir, D., Oulasvirta, A.: How we type: movement strategies and performance in everyday typing. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (2016)

    Google Scholar 

  14. Feix, T., Romero, J., Schmiedmayer, H.B., Dollar, A.M., Kragic, D.: The GRASP taxonomy of human grasp types. IEEE Trans. Hum.-Mach. Syst. 46(1), 66–77 (2016)

    Article  Google Scholar 

  15. Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  16. GelSight tactile sensor. http://www.gelsight.com

  17. Goldfeder, C., Ciocarlie, M.T., Dang, H., Allen, P.K.: The Columbia grasp database. In: IEEE International Conference on Robotics and Automation (ICRA) (2009)

    Google Scholar 

  18. Hamer, H., Gall, J., Weise, T., Van Gool, L.: An object-dependent hand pose prior from sparse training data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  19. Hampali, S., Oberweger, M., Rad, M., Lepetit, V.: HO-3D: a multi-user, multi-object dataset for joint 3D hand-object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  20. Han, S., Liu, B., Wang, R., Ye, Y., Twigg, C.D., Kin, K.: Online optical marker-based hand tracking with deep labels. ACM Trans. Graph. (TOG) 37(4), 166:1–166:10 (2018)

    Article  Google Scholar 

  21. Handa, A., et al.: DexPilot: vision based teleoperation of dexterous robotic hand-arm system. In: IEEE International Conference on Robotics and Automation (ICRA) (2019)

    Google Scholar 

  22. Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.: Markerless motion capture with unsynchronized moving cameras. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

    Google Scholar 

  23. Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constrains. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  24. Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  25. Hsiao, K., Lozano-Perez, T.: Imitation learning of whole-body grasps. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2006)

    Google Scholar 

  26. Johnson, M.K., Cole, F., Raj, A., Adelson, E.H.: Microgeometry capture using an elastomeric sensor. ACM Trans. Graph. (TOG) 30(4), 46:1–46:8 (2011)

    Article  Google Scholar 

  27. Kamakura, N., Matsuo, M., Ishii, H., Mitsuboshi, F., Miura, Y.: Patterns of static prehension in normal hands. Am. J. Occup. Therapy 34(7), 437–445 (1980)

    Article  Google Scholar 

  28. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  29. Karras, T.: Maximizing parallelism in the construction of BVHs, octrees, and k-d trees. In: Proceedings of the ACM SIGGRAPH/Eurographics Conference on High-Performance Graphics (2012)

    Google Scholar 

  30. Kim, V.G., Chaudhuri, S., Guibas, L., Funkhouser, T.: Shape2pose: human-centric shape analysis. ACM Trans. Graph. (TOG) 33(4), 120:1–120:12 (2014)

    Google Scholar 

  31. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  32. Kjellstrom, H., Kragic, D., Black, M.J.: Tracking people interacting with objects. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  33. Kokic, M., Kragic, D., Bohg, J.: Learning task-oriented grasping from human activity datasets. IEEE Rob. Autom. Lett. (RA-L) 5(2), 3352–3359 (2020)

    Article  Google Scholar 

  34. Kry, P.G., Pai, D.K.: Interaction capture and synthesis. ACM Trans. Graph. (TOG) 25(3), 872–880 (2006)

    Article  Google Scholar 

  35. Le, H.V., Mayer, S., Bader, P., Henze, N.: Fingers’ range and comfortable area for one-handed smartphone interaction beyond the touchscreen. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (2018)

    Google Scholar 

  36. Lee, K.H., Choi, M.G., Lee, J.: Motion patches: building blocks for virtual environments annotated with motion data. ACM Trans. Graph. (TOG) 25(3), 898–906 (2006)

    Article  Google Scholar 

  37. Li, Z., Sedlar, J., Carpentier, J., Laptev, I., Mansard, N., Sivic, J.: Estimating 3D motion and forces of person-object interactions from monocular video. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  38. Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  39. Mandery, C., Terlemez, Ö., Do, M., Vahrenkamp, N., Asfour, T.: The KIT whole-body human motion database. In: International Conference on Advanced Robotics (ICAR) (2015)

    Google Scholar 

  40. Mascaro, S.A., Asada, H.H.: Photoplethysmograph fingernail sensors for measuring finger forces without haptic obstruction. IEEE Trans. Rob. Autom. (TRA) 17(5), 698–708 (2001)

    Article  Google Scholar 

  41. Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Rob. Autom. Mag. (RAM) 11(4), 110–122 (2004)

    Article  Google Scholar 

  42. Monszpart, A., Guerrero, P., Ceylan, D., Yumer, E., Mitra, N.J.: iMapper: interaction-guided scene mapping from monocular videos. ACM Trans. Graph. (TOG) 38(4), 92:1–92:15 (2019)

    Article  Google Scholar 

  43. Napier, J.R.: The prehensile movements of the human hand. J. Bone Joint Surg. 38(4), 902–913 (1956)

    Article  Google Scholar 

  44. Oberweger, M., Wohlhart, P., Lepetit, V.: Generalized feedback loop for joint hand-object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(8), 1898–1912 (2020)

    Article  Google Scholar 

  45. Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2011)

    Google Scholar 

  46. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  47. Pham, T., Kyriazis, N., Argyros, A.A., Kheddar, A.: Hand-object contact force estimation from markerless visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(12), 2883–2896 (2018)

    Article  Google Scholar 

  48. Pirk, S., et al.: Understanding and exploiting object interaction landscapes. ACM Trans. Graph. (TOG) 36(3), 31:1–31:14 (2017)

    Article  Google Scholar 

  49. Pollard, N.S., Zordan, V.B.: Physically based grasping control from example. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2005)

    Google Scholar 

  50. POSER: 3D rendering and animation software. https://www.posersoftware.com

  51. Pressure Profile Systems Inc. (PPS). https://pressureprofile.com

  52. Prokudin, S., Lassner, C., Romero, J.: Efficient learning on point clouds with basis point sets. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  53. Ranjan, A., Hoffmann, D.T., Tzionas, D., Tang, S., Romero, J., Black, M.J.: Learning multi-human optical flow. Int. J. Comput. Vis. (IJCV) 128, 873–890 (2020)

    Article  Google Scholar 

  54. Rogez, G., Supančič III, J.S., Ramanan, D.: Understanding everyday hands in action from RGB-D images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  55. Romero, J., Kjellström, H., Kragic, D.: Hands in action: real-time 3D reconstruction of hands in interaction with objects. In: IEEE International Conference on Robotics and Automation (ICRA) (2010)

    Google Scholar 

  56. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (TOG) 36(6), 245:1–245:17 (2017)

    Article  Google Scholar 

  57. Rosenhahn, B., Schmaltz, C., Brox, T., Weickert, J., Cremers, D., Seidel, H.: Markerless motion capture of man-machine interaction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

    Google Scholar 

  58. Ruff, H.A.: Infants’ manipulative exploration of objects: effects of age and object characteristics. Dev. Psychol. 20(1), 9 (1984)

    Article  Google Scholar 

  59. Sahbani, A., El-Khoury, S., Bidaud, P.: An overview of 3D object grasp synthesis algorithms. Rob. Auton. Syst. (RAS) 60(3), 326–336 (2012)

    Article  Google Scholar 

  60. Savva, M., Chang, A.X., Hanrahan, P., Fisher, M., Nießner, M.: SceneGrok: inferring action maps in 3D environments. ACM Trans. Graph. (TOG) 33(6), 212:1–212:10 (2014)

    Article  Google Scholar 

  61. Savva, M., Chang, A.X., Hanrahan, P., Fisher, M., Nießner, M.: PiGraphs: learning interaction snapshots from observations. ACM Trans. Graph. (TOG) 35(4), 1391–13912 (2016)

    Article  Google Scholar 

  62. Sridhar, S., Mueller, F., Zollhoefer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Proceedings of the European Conference on Computer Vision (ECCV) (2016)

    Google Scholar 

  63. Starke, S., Zhang, H., Komura, T., Saito, J.: Neural state machine for character-scene interactions. ACM Trans. Graph. (TOG) 38(6), 209:1–209:14 (2019)

    Article  Google Scholar 

  64. Stratasys Fortus 360mc: 3D printing. https://www.stratasys.com/resources/search/white-papers/fortus-360mc-400mc

  65. Sundaram, S., Kellnhofer, P., Li, Y., Zhu, J.Y., Torralba, A., Matusik, W.: Learning the signatures of the human grasp using a scalable tactile glove. Nature 569(7758), 698–702 (2019)

    Article  Google Scholar 

  66. Tekin, B., Bogo, F., Pollefeys, M.: H+O: unified egocentric recognition of 3D hand-object poses and interactions. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  67. Tekscan grip system: Tactile grip force and pressure sensing. https://www.tekscan.com/products-solutions/systems/grip-system

  68. Tsoli, A., Argyros, A.A.: Joint 3D tracking of a deformable object in interaction with a hand. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  69. Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vis. (IJCV) 118(2), 172–193 (2016)

    Article  MathSciNet  Google Scholar 

  70. Varol, G., et al.: Learning from synthetic humans. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  71. Vicon Vantage: Cutting edge, flagship camera with intelligent feedback and resolution. https://www.vicon.com/hardware/cameras/vantage

  72. Wang, Y., et al.: Video-based hand manipulation capture through composite motion control. ACM Trans. Graph. (TOG) 32(4), 43:1–43:14 (2013)

    Article  Google Scholar 

  73. Wang, Z., Chen, L., Rathore, S., Shin, D., Fowlkes, C.: Geometric pose affordance: 3D human pose with scene constraints. arXiv:1905.07718 (2019)

  74. Welschehold, T., Dornhege, C., Burgard, W.: Learning manipulation actions from human demonstrations. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2016)

    Google Scholar 

  75. XSENS: Inertial motion capture. https://www.xsens.com/motion-capture

  76. Yamamoto, M., Yagishita, K.: Scene constraints-aided tracking of human body. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2000)

    Google Scholar 

  77. Ye, Y., Liu, C.K.: Synthesis of detailed hand manipulations using contact sampling. ACM Trans. Graph. (TOG) 31(4), 41:1–41:10 (2012)

    Article  Google Scholar 

  78. Zhang, H., Bo, Z.H., Yong, J.H., Xu, F.: InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions. ACM Trans. Graph. (TOG) 38(4), 48:1–48:11 (2019)

    Google Scholar 

  79. Zhang, Y., Hassan, M., Neumann, H., Black, M.J., Tang, S.: Generating 3D people in scenes without people. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

Download references

Acknowledgements

We thank S. Polikovsky, M. Höschle (MH) and M. Landry (ML) for the MoCap facility. We thank F. Mattioni, D. Hieber, and A. Valis for MoCap cleaning. We thank ML and T. Alexiadis for trial coordination, MH and F. Grimminger for 3D printing, V. Callaghan for voice recordings and J. Tesch for renderings. Disclosure: In the last five years, MJB has received research gift funds from Intel, Nvidia, Facebook, and Amazon. He is a co-founder and investor in Meshcapade GmbH, which commercializes 3D body shape technology. While MJB is a part-time employee of Amazon, his research was performed solely at, and funded solely by, MPI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omid Taheri .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6536 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Taheri, O., Ghorbani, N., Black, M.J., Tzionas, D. (2020). GRAB: A Dataset of Whole-Body Human Grasping of Objects. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12349. Springer, Cham. https://doi.org/10.1007/978-3-030-58548-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58548-8_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58547-1

  • Online ISBN: 978-3-030-58548-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics