Dealing with Ambiguity in Robotic Grasping via Multiple Predictions

Ghazaei, Ghazal; Laina, Iro; Rupprecht, Christian; Tombari, Federico; Navab, Nassir; Nazarpour, Kianoush

doi:10.1007/978-3-030-20870-7_3

Ghazal Ghazaei^12,13,
Iro Laina¹³,
Christian Rupprecht¹³,
Federico Tombari¹³,
Nassir Navab¹³ &
…
Kianoush Nazarpour^12,14

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11364))

Included in the following conference series:

Asian Conference on Computer Vision

1874 Accesses
7 Citations

Abstract

Humans excel in grasping and manipulating objects because of their life-long experience and knowledge about the 3D shape and weight distribution of objects. However, the lack of such intuition in robots makes robotic grasping an exceptionally challenging task. There are often several equally viable options of grasping an object. However, this ambiguity is not modeled in conventional systems that estimate a single, optimal grasp position. We propose to tackle this problem by simultaneously estimating multiple grasp poses from a single RGB image of the target object. Further, we reformulate the problem of robotic grasping by replacing conventional grasp rectangles with grasp belief maps, which hold more precise location information than a rectangle and account for the uncertainty inherent to the task. We augment a fully convolutional neural network with a multiple hypothesis prediction model that predicts a set of grasp hypotheses in under 60 ms, which is critical for real-time robotic applications. The grasp detection accuracy reaches over \(90\%\) for unseen objects, outperforming the current state of the art on this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Asif, U., Bennamoun, M., Sohel, F.A.: RGB-D object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans. Rob. 33(3), 547–564 (2017)
Article Google Scholar
Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: International Conference on Automatic Face & Gesture Recognition (FG 2017) (2017)
Google Scholar
Bicchi, A., Kumar, V.: Robotic grasping and contact: a review. In: Proceedings of 2000 IEEE International Conference on Robotics and Automation (ICRA), vol. 1, pp. 348–353. IEEE (2000)
Google Scholar
Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_44
Chapter Google Scholar
Bulat, A., Tzimiropoulos, G.: Super-fan: integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with GANs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) (1977)
Google Scholar
Du, X., et al.: Articulated multi-instrument 2D pose estimation using fully convolutional networks. IEEE Trans. Med. Imaging (2018)
Google Scholar
Guo, D., Sun, F., Liu, H., Kong, T., Fang, B., Xi, N.: A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE (2017)
Google Scholar
Guzman-Rivera, A., et al.: Multi-output learning for camera relocalization. In: Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Jiang, Y., Moseson, S., Saxena, A.: Efficient grasping from RGB-D images: learning using a new rectangle representation. In: International Conference on Robotics and Automation (ICRA). IEEE (2011)
Google Scholar
Kehoe, B., Patil, S., Abbeel, P., Goldberg, K.: A survey of research on cloud robotics and automation
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Google Scholar
Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. arXiv preprint arXiv:1611.08036 (2016)
Laina, I., et al.: Concurrent segmentation and localization for tracking of surgical instruments. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 664–672. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_75
Chapter Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV). IEEE (2016)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)
Google Scholar
Lee, S., Prakash, S.P.S., Cogswell, M., Ranjan, V., Crandall, D., Batra, D.: Stochastic multiple choice learning for training diverse deep ensembles. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Rob. Res. (2015)
Google Scholar
Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. arXiv preprint arXiv:1603.02199 (2016)
Li, Z., Chen, Q., Koltun, V.: Interactive image segmentation with latent diversity. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Mahler, J., et al.: Dex-net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint arXiv:1703.09312 (2017)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Hoboken (2004)
MATH Google Scholar
Merget, D., Rock, M., Rigoll, G.: Robust facial landmark detection via a fully-convolutional local-global context network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Rob. Autom. Mag. (2004)
Google Scholar
Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE (2015)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Rochan, M., Ye, L., Wang, Y.: Video summarization using fully convolutional sequence networks. arXiv preprint arXiv:1805.10538 (2018)
Rupprecht, C., et al.: Learning in an uncertain world: representing ambiguity through multiple hypotheses. In: International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. Int. J. Rob. Res. (2008)
Google Scholar
Varley, J., DeChant, C., Richardson, A., Nair, A., Ruales, J., Allen, P.: Shape completion enabled robotic grasping. arXiv preprint arXiv:1609.08546 (2016)
Vedaldi, A., Lenc, K.: MatConvNet - convolutional neural networks for MATLAB. In: Proceeding of the ACM International Conference on Multimedia (2015)
Google Scholar
Viereck, U., Pas, A., Saenko, K., Platt, R.: Learning a visuomotor controller for real world robotic grasping using simulated depth images. In: Conference on Robot Learning (2017)
Google Scholar
Wang, Z., Li, Z., Wang, B., Liu, H.: Robot grasp detection using multimodal deep convolutional neural networks. Adv. Mech. Eng. (2016)
Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Zapata-Impata, B.S.: Using geometry to detect grasping points on 3D unknown point cloud. In: International Conference on Informatics in Control, Automation and Robotics (2017)
Google Scholar

Download references

Acknowledgments

This work is supported by UK Engineering and Physical Sciences Research Council (EP/R004242/1). We also gratefully acknowledge the support of NVIDIA Corporation with the donation of a Titan Xp GPU used for the experiments.

Author information

Authors and Affiliations

School of Engineering, Newcastle University, Newcastle, UK
Ghazal Ghazaei & Kianoush Nazarpour
Technische Universität München, Munich, Germany
Ghazal Ghazaei, Iro Laina, Christian Rupprecht, Federico Tombari & Nassir Navab
Institute of Neuroscience, Newcastle University, Newcastle, UK
Kianoush Nazarpour

Authors

Ghazal Ghazaei
View author publications
You can also search for this author in PubMed Google Scholar
Iro Laina
View author publications
You can also search for this author in PubMed Google Scholar
Christian Rupprecht
View author publications
You can also search for this author in PubMed Google Scholar
Federico Tombari
View author publications
You can also search for this author in PubMed Google Scholar
Nassir Navab
View author publications
You can also search for this author in PubMed Google Scholar
Kianoush Nazarpour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ghazal Ghazaei .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C.V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghazaei, G., Laina, I., Rupprecht, C., Tombari, F., Navab, N., Nazarpour, K. (2019). Dealing with Ambiguity in Robotic Grasping via Multiple Predictions. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11364. Springer, Cham. https://doi.org/10.1007/978-3-030-20870-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-20870-7_3
Published: 25 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20869-1
Online ISBN: 978-3-030-20870-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics