Skip to main content
Log in

Development of a robust cascaded architecture for intelligent robot grasping using limited labelled data

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Grasping objects intelligently is a challenging task even for humans, and we spend a considerable amount of time during our childhood to learn how to grasp objects correctly. In the case of robots, we cannot afford to spend that much time on making it to learn how to grasp objects effectively. Therefore, in the present research we propose an efficient learning architecture based on VQVAE so that robots can be taught with sufficient data corresponding to correct grasping. However, getting sufficient labelled data is extremely difficult in the robot grasping domain. To help solve this problem, a semi-supervised learning-based model, which has much more generalization capability even with limited labelled data set, has been investigated. Its performance shows 6% improvement when compared with existing state-of-the-art models including our earlier model. During experimentation, it has been observed that our proposed model, RGGCNN2, performs significantly better, both in grasping isolated objects as well as objects in a cluttered environment, compared to the existing approaches which do not use unlabelled data for generating grasping rectangles. To the best of our knowledge, developing an intelligent robot grasping model (based on semi-supervised learning) trained through representation learning and exploiting the high-quality learning ability of GGCNN2 architecture with the limited number of labelled dataset together with the learned latent embeddings, can be used as a de-facto training method which has been established and also validated in this paper through rigorous hardware experimentations using Baxter (Anukul) research robot (Video demonstration).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Abbreviations

VAE::

Variational auto-encoder

VQVAE::

Vector-quantized VAE

CNN::

Convolutional neural network

GGCNN::

Generative grasp CNN

GGCNN2::

Generative grasp CNN-2

RGGCNN::

Representation-based GGCNN

RGGCNN2::

Representation-based GGCNN2

References

  1. Sahbani, A., El-Khoury, S., Bidaud, P.: An overview of 3d object grasp synthesis algorithms. Robot. Auton. Syst. 60, 326–336 (2012). https://doi.org/10.1016/j.robot.2011.07.016

    Article  Google Scholar 

  2. Bohg, J., Morales, A., Asfour, T., Kragic, D.: Data-driven grasp synthesis-a survey. IEEE Trans. Robot. 30(2), 289–309 (2014)

    Article  Google Scholar 

  3. Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)

    Article  Google Scholar 

  4. Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J.A., Goldberg, K.: Dex-net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. CoRR arXiv:1703.09312 (2017)

  5. Pinto, L., Gupta, A.: Supersizing self-supervision: learning to grasp from 50k tries and 700 robot hours. CoRR arXiv:1509.06825 (2015)

  6. Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 769–776 (2017). IEEE

  7. Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1316–1322 (2015). IEEE

  8. Ku, L.Y., Learned-Miller, E.G., Grupen, R.A.: Associating grasping with convolutional neural network features. CoRR arXiv:1609.03947 (2016)

  9. Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. CoRR arXiv:1804.05172 (2018)

  10. Morrison, D., Corke, P., Leitner, J.: Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 39(2–3), 183–201 (2020). https://doi.org/10.1177/0278364919859066

    Article  Google Scholar 

  11. van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. CoRR arXiv:1711.00937 (2017)

  12. van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., Kavukcuoglu, K.: Conditional image generation with pixelcnn decoders. CoRR arXiv:1606.05328 (2016)

  13. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14, pp. 2672–2680. MIT Press, Cambridge, MA, USA (2014)

  14. van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A.W., Kavukcuoglu, K.: Wavenet: a generative model for raw audio. CoRR arXiv:1609.03499 (2016)

  15. Mehri, S., Kumar, K., Gulrajani, I., Kumar, R., Jain, S., Sotelo, J., Courville, A.C., Bengio, Y.: Samplernn: an unconditional end-to-end neural audio generation model. CoRR arXiv:1612.07837 (2016)

  16. Kalchbrenner, N., van den Oord, A., Simonyan, K., Danihelka, I., Vinyals, O., Graves, A., Kavukcuoglu, K.: Video pixel networks. CoRR arXiv:1610.00527 (2016)

  17. Finn, C., Goodfellow, I.J., Levine, S.: Unsupervised learning for physical interaction through video prediction. CoRR arXiv:1605.07157 (2016)

  18. Maitin-Shepard, J., Cusumano-Towner, M., Lei, J., Abbeel, P.: Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding. In: 2010 IEEE International Conference on Robotics and Automation, pp. 2308–2315 (2010)

  19. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. CoRR arXiv:1506.02640 (2015)

  20. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. CoRR arXiv:1512.02325 (2015)

  21. Konidaris, G., Kuindersma, S., Grupen, R., Barto, A.: Robot learning from demonstration by constructing skill trees. Int. J. Robot. Res. 31(3), 360–375 (2012). https://doi.org/10.1177/0278364911428653

    Article  Google Scholar 

  22. Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. CoRR arXiv:1603.02199 (2016)

  23. Viereck, U., ten Pas, A., Saenko, K., Jr., R.P.: Learning a visuomotor controller for real world robotic grasping using easily simulated depth images. CoRR arXiv:1706.04652 (2017)

  24. Schmidt, P., Vahrenkamp, N., Wächter, M., Asfour, T.: Grasping of unknown objects using deep convolutional neural networks based on depth images. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6831–6838 (2018)

  25. Zeng, A., Song, S., Yu, K., Donlon, E., Hogan, F.R., Bauzá, M., Ma, D., Taylor, O., Liu, M., Romo, E., Fazeli, N., Alet, F., Dafle, N.C., Holladay, R., Morona, I., Nair, P.Q., Green, D., Taylor, I.J., Liu, W., Funkhouser, T.A., Rodriguez, A.: Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. CoRR arXiv:1710.01330 (2017)

  26. Wei, J., Liu, H., Yan, G., Sun, F.: Robotic grasping recognition using multi-modal deep extreme learning machine. Multidimension. Syst. Signal Process. 28(3), 817–833 (2017)

    Article  MathSciNet  Google Scholar 

  27. Wang, J., Hu, Q., Jiang, D.: A Lagrangian network for kinematic control of redundant robot manipulators. IEEE Trans. Neural Netw. 10(5), 1123–1132 (1999)

    Article  Google Scholar 

  28. University, C.: Robot learning lab: Learning to grasp. Available online: http://pr.cs.cornell.edu/grasping/rect_data/data.php

  29. Depierre, A., Dellandréa, E., Chen, L.: Jacquard: a large scale dataset for robotic grasp detection. CoRR arXiv:1803.11469 (2018)

  30. Tchuiev, V., Indelman, V.: Inference over distribution of posterior class probabilities for reliable Bayesian classification and object-level perception. IEEE Robot. Autom. Lett. 3(4), 4329–4336 (2018). https://doi.org/10.1109/LRA.2018.2852844

    Article  Google Scholar 

  31. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2014)

  32. Ji, S., Zhang, Z., Ying, S., Wang, L., Zhao, X., Gao, Y.: Kullback–Leibler divergence metric learning. IEEE Trans. Cybern. (2020). https://doi.org/10.1109/TCYB.2020.3008248

    Article  Google Scholar 

  33. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. CoRR arXiv:1411.4555 (2014)

  34. Ju, Z., Yang, C., Ma, H.: Kinematics modeling and experimental verification of baxter robot. In: Proceedings of the 33rd Chinese Control Conference, pp. 8518–8523 (2014). https://doi.org/10.1109/ChiCC.2014.6896430

  35. Mahajan, M., Bhattacharjee, T., Krishnan, A., Shukla, P., Nandi, G.C.: Robotic grasp detection by learning representation in a vector quantized manifold. In: 2020 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2020)

  36. Wang, Z., Li, Z., Wang, B., Liu, H.: Robot grasp detection using multimodal deep convolutional neural networks. Adv. Mech. Eng. 8(9), 1687814016668077 (2016)

    Article  Google Scholar 

  37. Asif, U., Tang, J., Harrer, S.: Ensemblenet: improving grasp detection using an ensemble of convolutional neural networks. In: BMVC, p. 10 (2018)

  38. Karaoguz, H., Jensfelt, P.: Object detection approach for robot grasp detection. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 4953–4959 (2019). IEEE

  39. Asif, U., Tang, J., Harrer, S.: Graspnet: an efficient convolutional neural network for real-time grasp detection for low-powered devices. In: IJCAI, vol. 7, pp. 4875–4882 (2018)

  40. Guo, D., Sun, F., Liu, H., Kong, T., Fang, B., Xi, N.: A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1609–1614 (2017). IEEE

  41. Shukla, P., Mahajan, M., Bhattacharjee, T., Krishnan, A., Nandi, G.C.: Robotic grasp detection by learning representation in a vector quantized manifold. In: CVPR, Women in Computer Vision Workshop (Poster) (2020)

  42. Ying, Z., Li, G., Zang, X., Wang, R., Wang, W.: A novel shadow-free feature extractor for real-time road detection. In: Proceedings of the 24th ACM international conference on Multimedia, pp. 611–615 (2016). https://doi.org/10.1145/2964284.2967294

  43. Huang, J.-B., Chen, C.-S.: Moving cast shadow detection using physics-based features. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2310–2317 (2009). https://doi.org/10.1109/CVPR.2009.5206629

  44. Healey, G.: Segmenting images using normalized color. IEEE Trans. Syst. Man Cybern. 22(1), 64–73 (1992). https://doi.org/10.1109/21.141311

    Article  Google Scholar 

  45. Ying, Z., Li, G., Wen, S., Tan, G.: ORGB: offset correction in RGB color space for illumination-robust image processing. CoRR arXiv:1708.00975 (2017)

Download references

Acknowledgements

The present research is partially funded by the I-Hub foundation for Cobotics (Technology Innovation Hub of IIT-Delhi set up by the Department of Science and Technology, Govt. of India).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priya Shukla.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shukla, P., Kushwaha, V. & Nandi, G.C. Development of a robust cascaded architecture for intelligent robot grasping using limited labelled data. Machine Vision and Applications 34, 99 (2023). https://doi.org/10.1007/s00138-023-01459-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01459-2

Keywords

Navigation