Grasp Configuration Synthesis from 3D Point Clouds with Attention Mechanism

Hoang, Dinh-Cuong; Nguyen, Anh-Nhat; Vu, Van-Duc; Vu, Duy-Quang; Nguyen, Van-Thiep; Nguyen, Thu-Uyen; Tran, Cong-Trinh; Phan, Khanh-Toan; Ho, Ngoc-Trung

doi:10.1007/s10846-023-02007-w

Grasp Configuration Synthesis from 3D Point Clouds with Attention Mechanism

Regular paper
Published: 11 November 2023

Volume 109, article number 71, (2023)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Dinh-Cuong Hoang ORCID: orcid.org/0000-0001-6058-2426¹,
Anh-Nhat Nguyen¹,
Van-Duc Vu¹,
Duy-Quang Vu¹,
Van-Thiep Nguyen¹,
Thu-Uyen Nguyen¹,
Cong-Trinh Tran¹,
Khanh-Toan Phan¹ &
…
Ngoc-Trung Ho¹

180 Accesses
5 Citations
Explore all metrics

Abstract

Grasp generation is a crucial task in robotics, especially in unstructured environments, where robots must identify suitable grasp locations on objects and determine the grasp configuration. Recent advances in deep learning have led to the development of end-to-end models for 6-DOF grasp generation that can learn to directly map from input point clouds to grasp configurations without intermediate processing steps. However, these models often treat all points in a scene equally, leading to suboptimal results in cluttered contexts where meaningfulness distributions are disparate due to occlusion. While attention mechanisms have shown promise in improving the accuracy and efficiency of various tasks in occluded scenes, their effectiveness in improving grasp generation performance is still an active area of research. Inspired by this potential, we explore the power of attention mechanisms in improving grasp generation from 3D point clouds. Building upon the previous work with VoteGrasp 2022, we integrate a wide range of attention modules and compare their effects and characteristics to identify the most successful combination for enhancing grasp generation performance. We also extend VoteGrasp by adding a semantic object classification loss to the loss function, making our method more flexible than existing approaches. Based on the detailed experiments and analysis, our research provides valuable insights into the use of attention mechanisms for 3D point cloud grasp generation, highlighting their potential to improve the accuracy and efficiency of robotic systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

Article Open access 16 April 2024

A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation

Article 18 May 2024

Data Availability

The DexYCB and FPHAB datasets are publicly available at https://dex-ycb.github.io/ and https://guiggh.github.io/publications/first-person-hands/ respectively.

Code Availibility

Not applicable.

References

Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Sensor fusion IV: control paradigms and data structures, Spie, pp 586–606 (1992)
Bohg, J., Morales, A., Asfour, T., et al.: Data-driven grasp synthesis-a survey. IEEE Trans. Robot. 30(2), 289–309 (2013)
Article Google Scholar
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), IEEE, pp 60–65 (2005)
Calli, B., Singh, A., Bruce, J., et al.: Yale-cmu-berkeley dataset for robotic manipulation research. Int. J. Robot. Res. 36(3), 261–268 (2017)
Article Google Scholar
Choi, C., Schwarting, W., DelPreto, J., et al.: Learning object grasping for soft robot hands. IEEE Robot. Automat. Lett. 3(3), 2370–2377 (2018)
Article Google Scholar
Chu, F.J., Xu, R., Vela, P.A.: Real-world multiobject, multigrasp detection. IEEE Robot. Automat. Lett. 3(4), 3355–3362 (2018)
Article Google Scholar
Ciocarlie, M., Goldfeder, C., Allen, P.: Dimensionality reduction for hand-independent dexterous robotic grasping. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, pp 3270–3275 (2007)
Deng, H., Birdal, T., Ilic, S.: Ppfnet: Global context aware local features for robust 3d point matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 195–205 (2018)
Dias, A.S., Brites, C., Ascenso, J., et al.: Sift-based homographies for efficient multiview distributed visual sensing. IEEE Sens. J. 15(5), 2643–2656 (2014)
Article Google Scholar
Fang, H.S., Wang, C., Gou, M., et al.: Graspnet-1billion: A large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11,444–11,453 (2020)
Feng, M., Zhang, L., Lin, X., et al.: Point attention network for semantic segmentation of 3d point clouds. Patt. Recognit. 107, 107,446 (2020)
Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3146–3154 (2019)
Gou, M., Fang, H.S., Zhu, Z., et al.: Rgb matters: Learning 7-dof grasp poses on monocular rgbd images. In: 2021 Ieee International Conference on Robotics and Automation (ICRA), IEEE, pp 13,459–13,466 (2021)
He, Y., Huang, H., Fan, H., et al.: Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3003–3013 (2021)
Hoang, D.C., Stork, J.A., Stoyanov, T.: Context-aware grasp generation in cluttered scenes. In: IEEE International Conference on Robotics and Automation (ICRA 2022), Philadelphia, USA, May 23-27, 2022 (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141 (2018a)
Hu, S.M., Cai, J.X., Lai, Y.K.: Semantic labeling and instance segmentation of 3d point clouds using patch context analysis and multiscale processing. IEEE Trans Visualizat. Comput Graph 26(7), 2485–2498 (2018)
Article Google Scholar
Huang, Z., Wang, X., Huang, L., et al.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 603–612 (2019)
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)
Article Google Scholar
Liang, H., Ma, X., Li, S., et al.: Pointnetgpd: Detecting grasp configurations from point sets. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp 3629–3635 (2019)
Mahler, J., Matl, M., Liu, X., et al.: Dex-net 3.0: Computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5620–5627 (2018)
Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Robot. Automat. Mag. 11(4), 110–122 (2004)
Article Google Scholar
Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv:1804.05172 (2018)
Mousavian, A., Eppner, C., Fox, D.: 6-dof graspnet: Variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2901–2910 (2019)
Muñoz, E., Konishi, Y., Murino, V., et al.: Fast 6d pose estimation for texture-less objects from a single rgb image. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5623–5630 (2016)
Ni, P., Zhang, W., Zhu, X., et al.: Pointnet++ grasping: learning an end-to-end spatial grasp generation algorithm from sparse point clouds. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 3619–3625 (2020)
Paigwar, A., Erkent, O., Wolf, C., et al.: Attentional pointnet for 3d-object detection in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0 (2019)
Ten Pas, A., Gualtieri, M., Saenko, K., et al.: Grasp pose detection in point clouds. Int. J. Robot. Res. 36(13–14), 1455–1473 (2017)
Qi, C.R., Su, H., Mo, K., et al.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 652–660 (2017a)
Qi, C.R., Yi, L., Su, H., et al.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf. Process. Syst. 30 (2017b)
Qi, C.R., Litany, O., He, K., et al.: Deep hough voting for 3d object detection in point clouds. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9277–9286 (2019)
Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1316–1322 (2015)
Shi, Y., Chang, A.X., Wu, Z., et al.: Hierarchy denoising recursive autoencoders for 3d scene layout prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1771–1780 (2019)
Wang, C., Xu, D., Zhu, Y., et al.: Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3343–3352 (2019)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803 (2018)
Woo, S., Park, J., Lee, J.Y., et al.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19 (2018)
Wu, D., Zhuang, Z., Xiang, C., et al.: 6d-vnet: End-to-end 6-dof vehicle pose estimation from monocular rgb images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0 (2019)
Xie, S., Liu, S., Chen, Z., et al.: Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4606–4615 (2018)
Ye, X., Li, J., Huang, H., et al.: 3d recurrent neural networks with context fusion for point cloud semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 403–417 (2018)
Yue, K., Sun, M., Yuan, Y., et al.: Compact generalized non-local network. In: Advances in Neural Information Processing Systems, pp 6510–6519 (2018)
Zeng, A., Yu, K.T., Song, S., et al. Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1386–1383 (2017)
Zhang, W., Xiao, C.: Pcan: 3d attention map learning using contextual information for point cloud based retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12,436–12,445 (2019)
Zhao, H., Jiang, L., Jia, J., et al.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 16,259–16,268 (2021)

Download references

Author information

Authors and Affiliations

FPT University, Hanoi, 10000, Vietnam
Dinh-Cuong Hoang, Anh-Nhat Nguyen, Van-Duc Vu, Duy-Quang Vu, Van-Thiep Nguyen, Thu-Uyen Nguyen, Cong-Trinh Tran, Khanh-Toan Phan & Ngoc-Trung Ho

Authors

Dinh-Cuong Hoang
View author publications
You can also search for this author in PubMed Google Scholar
Anh-Nhat Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Van-Duc Vu
View author publications
You can also search for this author in PubMed Google Scholar
Duy-Quang Vu
View author publications
You can also search for this author in PubMed Google Scholar
Van-Thiep Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thu-Uyen Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Cong-Trinh Tran
View author publications
You can also search for this author in PubMed Google Scholar
Khanh-Toan Phan
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc-Trung Ho
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors conceived the research, designed and implemented the algorithm, and drafted the submitted version of the paper.

Corresponding author

Correspondence to Dinh-Cuong Hoang.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hoang, DC., Nguyen, AN., Vu, VD. et al. Grasp Configuration Synthesis from 3D Point Clouds with Attention Mechanism. J Intell Robot Syst 109, 71 (2023). https://doi.org/10.1007/s10846-023-02007-w

Download citation

Received: 04 May 2023
Accepted: 14 October 2023
Published: 11 November 2023
DOI: https://doi.org/10.1007/s10846-023-02007-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Grasp Configuration Synthesis from 3D Point Clouds with Attention Mechanism

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation

Data Availability

Code Availibility

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of Interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Grasp Configuration Synthesis from 3D Point Clouds with Attention Mechanism

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation

Data Availability

Code Availibility

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of Interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation