Skip to main content
Log in

Grasp Configuration Synthesis from 3D Point Clouds with Attention Mechanism

  • Regular paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Grasp generation is a crucial task in robotics, especially in unstructured environments, where robots must identify suitable grasp locations on objects and determine the grasp configuration. Recent advances in deep learning have led to the development of end-to-end models for 6-DOF grasp generation that can learn to directly map from input point clouds to grasp configurations without intermediate processing steps. However, these models often treat all points in a scene equally, leading to suboptimal results in cluttered contexts where meaningfulness distributions are disparate due to occlusion. While attention mechanisms have shown promise in improving the accuracy and efficiency of various tasks in occluded scenes, their effectiveness in improving grasp generation performance is still an active area of research. Inspired by this potential, we explore the power of attention mechanisms in improving grasp generation from 3D point clouds. Building upon the previous work with VoteGrasp 2022, we integrate a wide range of attention modules and compare their effects and characteristics to identify the most successful combination for enhancing grasp generation performance. We also extend VoteGrasp by adding a semantic object classification loss to the loss function, making our method more flexible than existing approaches. Based on the detailed experiments and analysis, our research provides valuable insights into the use of attention mechanisms for 3D point cloud grasp generation, highlighting their potential to improve the accuracy and efficiency of robotic systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

The DexYCB and FPHAB datasets are publicly available at https://dex-ycb.github.io/ and https://guiggh.github.io/publications/first-person-hands/ respectively.

Code Availibility

Not applicable.

References

  1. Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Sensor fusion IV: control paradigms and data structures, Spie, pp 586–606 (1992)

  2. Bohg, J., Morales, A., Asfour, T., et al.: Data-driven grasp synthesis-a survey. IEEE Trans. Robot. 30(2), 289–309 (2013)

    Article  Google Scholar 

  3. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), IEEE, pp 60–65 (2005)

  4. Calli, B., Singh, A., Bruce, J., et al.: Yale-cmu-berkeley dataset for robotic manipulation research. Int. J. Robot. Res. 36(3), 261–268 (2017)

    Article  Google Scholar 

  5. Choi, C., Schwarting, W., DelPreto, J., et al.: Learning object grasping for soft robot hands. IEEE Robot. Automat. Lett. 3(3), 2370–2377 (2018)

    Article  Google Scholar 

  6. Chu, F.J., Xu, R., Vela, P.A.: Real-world multiobject, multigrasp detection. IEEE Robot. Automat. Lett. 3(4), 3355–3362 (2018)

    Article  Google Scholar 

  7. Ciocarlie, M., Goldfeder, C., Allen, P.: Dimensionality reduction for hand-independent dexterous robotic grasping. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, pp 3270–3275 (2007)

  8. Deng, H., Birdal, T., Ilic, S.: Ppfnet: Global context aware local features for robust 3d point matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 195–205 (2018)

  9. Dias, A.S., Brites, C., Ascenso, J., et al.: Sift-based homographies for efficient multiview distributed visual sensing. IEEE Sens. J. 15(5), 2643–2656 (2014)

    Article  Google Scholar 

  10. Fang, H.S., Wang, C., Gou, M., et al.: Graspnet-1billion: A large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11,444–11,453 (2020)

  11. Feng, M., Zhang, L., Lin, X., et al.: Point attention network for semantic segmentation of 3d point clouds. Patt. Recognit. 107, 107,446 (2020)

  12. Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3146–3154 (2019)

  13. Gou, M., Fang, H.S., Zhu, Z., et al.: Rgb matters: Learning 7-dof grasp poses on monocular rgbd images. In: 2021 Ieee International Conference on Robotics and Automation (ICRA), IEEE, pp 13,459–13,466 (2021)

  14. He, Y., Huang, H., Fan, H., et al.: Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3003–3013 (2021)

  15. Hoang, D.C., Stork, J.A., Stoyanov, T.: Context-aware grasp generation in cluttered scenes. In: IEEE International Conference on Robotics and Automation (ICRA 2022), Philadelphia, USA, May 23-27, 2022 (2022)

  16. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141 (2018a)

  17. Hu, S.M., Cai, J.X., Lai, Y.K.: Semantic labeling and instance segmentation of 3d point clouds using patch context analysis and multiscale processing. IEEE Trans Visualizat. Comput Graph 26(7), 2485–2498 (2018)

    Article  Google Scholar 

  18. Huang, Z., Wang, X., Huang, L., et al.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 603–612 (2019)

  19. Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)

    Article  Google Scholar 

  20. Liang, H., Ma, X., Li, S., et al.: Pointnetgpd: Detecting grasp configurations from point sets. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp 3629–3635 (2019)

  21. Mahler, J., Matl, M., Liu, X., et al.: Dex-net 3.0: Computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5620–5627 (2018)

  22. Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Robot. Automat. Mag. 11(4), 110–122 (2004)

    Article  Google Scholar 

  23. Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv:1804.05172 (2018)

  24. Mousavian, A., Eppner, C., Fox, D.: 6-dof graspnet: Variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2901–2910 (2019)

  25. Muñoz, E., Konishi, Y., Murino, V., et al.: Fast 6d pose estimation for texture-less objects from a single rgb image. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5623–5630 (2016)

  26. Ni, P., Zhang, W., Zhu, X., et al.: Pointnet++ grasping: learning an end-to-end spatial grasp generation algorithm from sparse point clouds. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 3619–3625 (2020)

  27. Paigwar, A., Erkent, O., Wolf, C., et al.: Attentional pointnet for 3d-object detection in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0 (2019)

  28. Ten Pas, A., Gualtieri, M., Saenko, K., et al.: Grasp pose detection in point clouds. Int. J. Robot. Res. 36(13–14), 1455–1473 (2017)

  29. Qi, C.R., Su, H., Mo, K., et al.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 652–660 (2017a)

  30. Qi, C.R., Yi, L., Su, H., et al.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf. Process. Syst. 30 (2017b)

  31. Qi, C.R., Litany, O., He, K., et al.: Deep hough voting for 3d object detection in point clouds. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9277–9286 (2019)

  32. Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1316–1322 (2015)

  33. Shi, Y., Chang, A.X., Wu, Z., et al.: Hierarchy denoising recursive autoencoders for 3d scene layout prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1771–1780 (2019)

  34. Wang, C., Xu, D., Zhu, Y., et al.: Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3343–3352 (2019)

  35. Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803 (2018)

  36. Woo, S., Park, J., Lee, J.Y., et al.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19 (2018)

  37. Wu, D., Zhuang, Z., Xiang, C., et al.: 6d-vnet: End-to-end 6-dof vehicle pose estimation from monocular rgb images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0 (2019)

  38. Xie, S., Liu, S., Chen, Z., et al.: Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4606–4615 (2018)

  39. Ye, X., Li, J., Huang, H., et al.: 3d recurrent neural networks with context fusion for point cloud semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 403–417 (2018)

  40. Yue, K., Sun, M., Yuan, Y., et al.: Compact generalized non-local network. In: Advances in Neural Information Processing Systems, pp 6510–6519 (2018)

  41. Zeng, A., Yu, K.T., Song, S., et al. Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1386–1383 (2017)

  42. Zhang, W., Xiao, C.: Pcan: 3d attention map learning using contextual information for point cloud based retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12,436–12,445 (2019)

  43. Zhao, H., Jiang, L., Jia, J., et al.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 16,259–16,268 (2021)

Download references

Author information

Authors and Affiliations

Authors

Contributions

All the authors conceived the research, designed and implemented the algorithm, and drafted the submitted version of the paper.

Corresponding author

Correspondence to Dinh-Cuong Hoang.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hoang, DC., Nguyen, AN., Vu, VD. et al. Grasp Configuration Synthesis from 3D Point Clouds with Attention Mechanism. J Intell Robot Syst 109, 71 (2023). https://doi.org/10.1007/s10846-023-02007-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-023-02007-w

Keywords

Navigation