Abstract
Training computers to understand, model, and synthesize human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time. While “grasping” is commonly thought of as a single hand stably lifting an object, we capture the motion of the entire body and adopt the generalized notion of “whole-body grasps”. Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size. Given MoCap markers, we fit the full 3D body shape and pose, including the articulated face and hands, as well as the 3D object pose. This gives detailed 3D meshes over time, from which we compute contact between the body and object. This is a unique dataset, that goes well beyond existing ones for modeling and understanding how humans grasp and manipulate objects, how their full body is involved, and how interaction varies with the task. We illustrate the practical value of GRAB with an example application; we train GrabNet, a conditional generative network, to predict 3D hand grasps for unseen 3D object shapes. The dataset and code are available for research purposes at https://grab.is.tue.mpg.de.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amazon Mechanical Turk. https://www.mturk.com
Behbahani, F.M.P., Singla–Buxarrais, G., Faisal, A.A.: Haptic SLAM: an ideal observer model for bayesian inference of object shape and hand pose from contact dynamics. In: Bello, F., Kajimoto, H., Visell, Y. (eds.) EuroHaptics 2016. LNCS, vol. 9774, pp. 146–157. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42321-0_14
Bernardin, K., Ogawara, K., Ikeuchi, K., Dillmann, R.: A sensor fusion approach for recognizing continuous human grasping sequences using hidden Markov models. IEEE Trans. Rob. (T-RO) 21(1), 47–57 (2005)
Borras, J., Asfour, T.: A whole-body pose taxonomy for loco-manipulation tasks. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1578–1585 (2015)
Brahmbhatt, S., Ham, C., Kemp, C.C., Hays, J.: ContactDB: analyzing and predicting grasp contact via thermal imaging. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Brubaker, M.A., Fleet, D.J., Hertzmann, A.: Physics-based person tracking using the anthropomorphic walker. Int. J. Comput. Vis. (IJCV) 87(1), 140 (2009)
Chang, A.X., et al.: ShapeNet: An information-rich 3D model repository. arXiv:1512.03012 (2015)
Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M.J.: Monocular expressive body regression through body-driven attention. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Corona, E., Pumarola, A., Alenyà, G., Moreno-Noguer, F.: Context-aware human motion prediction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Cutkosky, M.R.: On grasp choice, grasp models, and the design of hands for manufacturing tasks. IEEE Trans. Rob. Autom. 5(3), 269–279 (1989)
Cyberglove III data glove. http://www.cyberglovesystems.com/cyberglove-iii
ElKoura, G., Singh, K.: Handrix: animating the human hand. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2003)
Feit, A.M., Weir, D., Oulasvirta, A.: How we type: movement strategies and performance in everyday typing. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (2016)
Feix, T., Romero, J., Schmiedmayer, H.B., Dollar, A.M., Kragic, D.: The GRASP taxonomy of human grasp types. IEEE Trans. Hum.-Mach. Syst. 46(1), 66–77 (2016)
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
GelSight tactile sensor. http://www.gelsight.com
Goldfeder, C., Ciocarlie, M.T., Dang, H., Allen, P.K.: The Columbia grasp database. In: IEEE International Conference on Robotics and Automation (ICRA) (2009)
Hamer, H., Gall, J., Weise, T., Van Gool, L.: An object-dependent hand pose prior from sparse training data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Hampali, S., Oberweger, M., Rad, M., Lepetit, V.: HO-3D: a multi-user, multi-object dataset for joint 3D hand-object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Han, S., Liu, B., Wang, R., Ye, Y., Twigg, C.D., Kin, K.: Online optical marker-based hand tracking with deep labels. ACM Trans. Graph. (TOG) 37(4), 166:1–166:10 (2018)
Handa, A., et al.: DexPilot: vision based teleoperation of dexterous robotic hand-arm system. In: IEEE International Conference on Robotics and Automation (ICRA) (2019)
Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.: Markerless motion capture with unsynchronized moving cameras. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constrains. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Hsiao, K., Lozano-Perez, T.: Imitation learning of whole-body grasps. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2006)
Johnson, M.K., Cole, F., Raj, A., Adelson, E.H.: Microgeometry capture using an elastomeric sensor. ACM Trans. Graph. (TOG) 30(4), 46:1–46:8 (2011)
Kamakura, N., Matsuo, M., Ishii, H., Mitsuboshi, F., Miura, Y.: Patterns of static prehension in normal hands. Am. J. Occup. Therapy 34(7), 437–445 (1980)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Karras, T.: Maximizing parallelism in the construction of BVHs, octrees, and k-d trees. In: Proceedings of the ACM SIGGRAPH/Eurographics Conference on High-Performance Graphics (2012)
Kim, V.G., Chaudhuri, S., Guibas, L., Funkhouser, T.: Shape2pose: human-centric shape analysis. ACM Trans. Graph. (TOG) 33(4), 120:1–120:12 (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)
Kjellstrom, H., Kragic, D., Black, M.J.: Tracking people interacting with objects. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Kokic, M., Kragic, D., Bohg, J.: Learning task-oriented grasping from human activity datasets. IEEE Rob. Autom. Lett. (RA-L) 5(2), 3352–3359 (2020)
Kry, P.G., Pai, D.K.: Interaction capture and synthesis. ACM Trans. Graph. (TOG) 25(3), 872–880 (2006)
Le, H.V., Mayer, S., Bader, P., Henze, N.: Fingers’ range and comfortable area for one-handed smartphone interaction beyond the touchscreen. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (2018)
Lee, K.H., Choi, M.G., Lee, J.: Motion patches: building blocks for virtual environments annotated with motion data. ACM Trans. Graph. (TOG) 25(3), 898–906 (2006)
Li, Z., Sedlar, J., Carpentier, J., Laptev, I., Mansard, N., Sivic, J.: Estimating 3D motion and forces of person-object interactions from monocular video. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Mandery, C., Terlemez, Ö., Do, M., Vahrenkamp, N., Asfour, T.: The KIT whole-body human motion database. In: International Conference on Advanced Robotics (ICAR) (2015)
Mascaro, S.A., Asada, H.H.: Photoplethysmograph fingernail sensors for measuring finger forces without haptic obstruction. IEEE Trans. Rob. Autom. (TRA) 17(5), 698–708 (2001)
Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Rob. Autom. Mag. (RAM) 11(4), 110–122 (2004)
Monszpart, A., Guerrero, P., Ceylan, D., Yumer, E., Mitra, N.J.: iMapper: interaction-guided scene mapping from monocular videos. ACM Trans. Graph. (TOG) 38(4), 92:1–92:15 (2019)
Napier, J.R.: The prehensile movements of the human hand. J. Bone Joint Surg. 38(4), 902–913 (1956)
Oberweger, M., Wohlhart, P., Lepetit, V.: Generalized feedback loop for joint hand-object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(8), 1898–1912 (2020)
Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2011)
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Pham, T., Kyriazis, N., Argyros, A.A., Kheddar, A.: Hand-object contact force estimation from markerless visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(12), 2883–2896 (2018)
Pirk, S., et al.: Understanding and exploiting object interaction landscapes. ACM Trans. Graph. (TOG) 36(3), 31:1–31:14 (2017)
Pollard, N.S., Zordan, V.B.: Physically based grasping control from example. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2005)
POSER: 3D rendering and animation software. https://www.posersoftware.com
Pressure Profile Systems Inc. (PPS). https://pressureprofile.com
Prokudin, S., Lassner, C., Romero, J.: Efficient learning on point clouds with basis point sets. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Ranjan, A., Hoffmann, D.T., Tzionas, D., Tang, S., Romero, J., Black, M.J.: Learning multi-human optical flow. Int. J. Comput. Vis. (IJCV) 128, 873–890 (2020)
Rogez, G., Supančič III, J.S., Ramanan, D.: Understanding everyday hands in action from RGB-D images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2015)
Romero, J., Kjellström, H., Kragic, D.: Hands in action: real-time 3D reconstruction of hands in interaction with objects. In: IEEE International Conference on Robotics and Automation (ICRA) (2010)
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (TOG) 36(6), 245:1–245:17 (2017)
Rosenhahn, B., Schmaltz, C., Brox, T., Weickert, J., Cremers, D., Seidel, H.: Markerless motion capture of man-machine interaction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
Ruff, H.A.: Infants’ manipulative exploration of objects: effects of age and object characteristics. Dev. Psychol. 20(1), 9 (1984)
Sahbani, A., El-Khoury, S., Bidaud, P.: An overview of 3D object grasp synthesis algorithms. Rob. Auton. Syst. (RAS) 60(3), 326–336 (2012)
Savva, M., Chang, A.X., Hanrahan, P., Fisher, M., Nießner, M.: SceneGrok: inferring action maps in 3D environments. ACM Trans. Graph. (TOG) 33(6), 212:1–212:10 (2014)
Savva, M., Chang, A.X., Hanrahan, P., Fisher, M., Nießner, M.: PiGraphs: learning interaction snapshots from observations. ACM Trans. Graph. (TOG) 35(4), 1391–13912 (2016)
Sridhar, S., Mueller, F., Zollhoefer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Proceedings of the European Conference on Computer Vision (ECCV) (2016)
Starke, S., Zhang, H., Komura, T., Saito, J.: Neural state machine for character-scene interactions. ACM Trans. Graph. (TOG) 38(6), 209:1–209:14 (2019)
Stratasys Fortus 360mc: 3D printing. https://www.stratasys.com/resources/search/white-papers/fortus-360mc-400mc
Sundaram, S., Kellnhofer, P., Li, Y., Zhu, J.Y., Torralba, A., Matusik, W.: Learning the signatures of the human grasp using a scalable tactile glove. Nature 569(7758), 698–702 (2019)
Tekin, B., Bogo, F., Pollefeys, M.: H+O: unified egocentric recognition of 3D hand-object poses and interactions. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Tekscan grip system: Tactile grip force and pressure sensing. https://www.tekscan.com/products-solutions/systems/grip-system
Tsoli, A., Argyros, A.A.: Joint 3D tracking of a deformable object in interaction with a hand. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vis. (IJCV) 118(2), 172–193 (2016)
Varol, G., et al.: Learning from synthetic humans. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Vicon Vantage: Cutting edge, flagship camera with intelligent feedback and resolution. https://www.vicon.com/hardware/cameras/vantage
Wang, Y., et al.: Video-based hand manipulation capture through composite motion control. ACM Trans. Graph. (TOG) 32(4), 43:1–43:14 (2013)
Wang, Z., Chen, L., Rathore, S., Shin, D., Fowlkes, C.: Geometric pose affordance: 3D human pose with scene constraints. arXiv:1905.07718 (2019)
Welschehold, T., Dornhege, C., Burgard, W.: Learning manipulation actions from human demonstrations. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2016)
XSENS: Inertial motion capture. https://www.xsens.com/motion-capture
Yamamoto, M., Yagishita, K.: Scene constraints-aided tracking of human body. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2000)
Ye, Y., Liu, C.K.: Synthesis of detailed hand manipulations using contact sampling. ACM Trans. Graph. (TOG) 31(4), 41:1–41:10 (2012)
Zhang, H., Bo, Z.H., Yong, J.H., Xu, F.: InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions. ACM Trans. Graph. (TOG) 38(4), 48:1–48:11 (2019)
Zhang, Y., Hassan, M., Neumann, H., Black, M.J., Tang, S.: Generating 3D people in scenes without people. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Acknowledgements
We thank S. Polikovsky, M. Höschle (MH) and M. Landry (ML) for the MoCap facility. We thank F. Mattioni, D. Hieber, and A. Valis for MoCap cleaning. We thank ML and T. Alexiadis for trial coordination, MH and F. Grimminger for 3D printing, V. Callaghan for voice recordings and J. Tesch for renderings. Disclosure: In the last five years, MJB has received research gift funds from Intel, Nvidia, Facebook, and Amazon. He is a co-founder and investor in Meshcapade GmbH, which commercializes 3D body shape technology. While MJB is a part-time employee of Amazon, his research was performed solely at, and funded solely by, MPI.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Taheri, O., Ghorbani, N., Black, M.J., Tzionas, D. (2020). GRAB: A Dataset of Whole-Body Human Grasping of Objects. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12349. Springer, Cham. https://doi.org/10.1007/978-3-030-58548-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-58548-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58547-1
Online ISBN: 978-3-030-58548-8
eBook Packages: Computer ScienceComputer Science (R0)