Transformer Based Feature Pyramid Network for Transparent Objects Grasp

Zhang, Jiawei; Liu, Houde; Xia, Chongkun

doi:10.1007/978-3-031-13822-5_37

Jiawei Zhang^14,15,
Houde Liu^14,15 &
Chongkun Xia^14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13456))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

2405 Accesses

Abstract

Transparent objects like glass bottles and plastic cups are common in daily life, while few works show good performance on grasping transparent objects due to their unique optic properties. Besides the difficulties of this task, there is no dataset for transparent objects grasp. To address this problem, we propose an efficient dataset construction pipeline to label grasp pose for transparent objects. With Blender physics engines, our pipeline could generate numerous photo-realistic images and label grasp poses in a short time. We also propose TTG-Net - a transformer-based feature pyramid network for generating planar grasp pose, which utilizes features pyramid network with residual module to extract features and use transformer encoder to refine features for better global information. TTG-Net is fully trained on the virtual dataset generated by our pipeline and it shows 80.4% validation accuracy on the virtual dataset. To prove the effectiveness of TTG-Net on real-world data, we also test TTG-Net with photos randomly captured in our lab. TTG-Net shows 73.4% accuracy on real-world benchmark which shows remarkable sim2real generalization. We also evaluate other main-stream methods on our dataset, TTG-Net shows better generalization ability.

Supported by the Shenzhen Science Fund for Distinguished Young Scholars (RCJC20210706091946001) and the Guangdong Special Branch Plan for Young Talent with Scientific and Technological Innovation (2019TQ05Z111).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yun, J., Moseson, S., Saxena, A.: Efficient grasping from RGBD images: learning using a new rectangle representation. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3304–3311. IEEE (2011)
Google Scholar
Depierre, A., Dellandréa, E., Chen, L.: Jacquard: a large scale dataset for robotic grasp detection. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3511–3516. IEEE (2018)
Google Scholar
Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1316–1322. IEEE (2015)
Google Scholar
Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 769–776. IEEE (2017)
Google Scholar
Guo, D., Sun, F., Liu, H., Kong, T., Fang, B., Xi, N.: A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1609–1614. IEEE (2017)
Google Scholar
Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. arXiv preprint arXiv:1804.05172 (2018)
Zhou, X., Lan, X., Zhang, H., Tian, Z., Zhang, Y., Zheng, N.: Fully convolutional grasp detection network with oriented anchor box. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7223–7230. IEEE (2018)
Google Scholar
Cao, H., Chen, G., Li, Z., Lin, J., Knoll, A.: Residual squeeze-and-excitation network with multi-scale spatial pyramid module for fast robotic grasping detection. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13445–13451. IEEE (2021)
Google Scholar
Ainetter, S., Fraundorfer, F.: End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from RGB. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13452–13458. IEEE (2021)
Google Scholar
Xu, Y., Nagahara, H., Shimada, A., Taniguchi, R.: Transcut: transparent object segmentation from a light-field image. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3442–3450 (2015)
Google Scholar
Chen, G., Han, K., Wong, K.-Y.K.: Tom-net: Learning transparent object matting from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9233–9241 (2018)
Google Scholar
Xie, E., Wang, W., Wang, W., Ding, M., Shen, C., Luo, P.: Segmenting transparent objects in the wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 696–711. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_41
Chapter Google Scholar
Xie, E., et al.: Trans2seg: transparent object segmentation with transformer (2021)
Google Scholar
Kalra, A., Taamazyan, V., Rao, S.K., Venkataraman, K., Raskar, R., Kadambi, A.: Deep polarization cues for transparent object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8602–8611 (2020)
Google Scholar
Lai, P.-J., Fuh, C.-S.: Transparent object detection using regions with convolutional neural network. In: IPPR Conference on Computer Vision, Graphics, and Image Processing, vol. 2 (2015)
Google Scholar
Sajjan, S., et al.: Clear grasp: 3D shape estimation of transparent objects for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642. IEEE (2020)
Google Scholar
Liu, X., Jonschkowski, R., Angelova, A., Konolige, K.: Keypose: multi-view 3D labeling and keypoint estimation for transparent objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11602–11610 (2020)
Google Scholar
Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)
Vaswani, A., et al.: Attention is all you need. Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)
Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
Google Scholar
Wang, Y., et al.: End-to-end video instance segmentation with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8741–8750 (2021)
Google Scholar
Zhang, H., Lan, X., Bai, S., Zhou, X., Tian, Z., Zheng, N.: Roi-based robotic grasp detection for object overlapping scenes. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4768–4775. IEEE (2019)
Google Scholar
Song, Y., Gao, L., Li, X., Shen, W.: A novel robotic grasp detection method based on region proposal networks. Robot. Comput.-Integr. Manuf. 65, 101963 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua University, Hai Dian, Beijing, China
Jiawei Zhang, Houde Liu & Chongkun Xia
AI and Robot Laboratory, Tsinghua Shenzhen Graduate School, Shen Zhen, China
Jiawei Zhang, Houde Liu & Chongkun Xia

Authors

Jiawei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Houde Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chongkun Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Houde Liu .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Honghai Liu
Huazhong University of Science and Technology, Wuhan, China
Zhouping Yin
Shenyang Institute of Automation, Shenyang, Liaoning, China
Lianqing Liu
Harbin Institute of Technology, Harbin, China
Li Jiang
Shanghai Jiao Tong University, Shanghai, China
Guoying Gu
Shenzhen Institutes of Advanced Technology, Shenzhen, China
Xinyu Wu
Harbin Institute of Technology, Shenzhen, China
Weihong Ren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Liu, H., Xia, C. (2022). Transformer Based Feature Pyramid Network for Transparent Objects Grasp. In: Liu, H., et al. Intelligent Robotics and Applications. ICIRA 2022. Lecture Notes in Computer Science(), vol 13456. Springer, Cham. https://doi.org/10.1007/978-3-031-13822-5_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-13822-5_37
Published: 04 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13821-8
Online ISBN: 978-3-031-13822-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Transformer Based Feature Pyramid Network for Transparent Objects Grasp