Abstract
Transparent objects like glass bottles and plastic cups are common in daily life, while few works show good performance on grasping transparent objects due to their unique optic properties. Besides the difficulties of this task, there is no dataset for transparent objects grasp. To address this problem, we propose an efficient dataset construction pipeline to label grasp pose for transparent objects. With Blender physics engines, our pipeline could generate numerous photo-realistic images and label grasp poses in a short time. We also propose TTG-Net - a transformer-based feature pyramid network for generating planar grasp pose, which utilizes features pyramid network with residual module to extract features and use transformer encoder to refine features for better global information. TTG-Net is fully trained on the virtual dataset generated by our pipeline and it shows 80.4% validation accuracy on the virtual dataset. To prove the effectiveness of TTG-Net on real-world data, we also test TTG-Net with photos randomly captured in our lab. TTG-Net shows 73.4% accuracy on real-world benchmark which shows remarkable sim2real generalization. We also evaluate other main-stream methods on our dataset, TTG-Net shows better generalization ability.
Supported by the Shenzhen Science Fund for Distinguished Young Scholars (RCJC20210706091946001) and the Guangdong Special Branch Plan for Young Talent with Scientific and Technological Innovation (2019TQ05Z111).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yun, J., Moseson, S., Saxena, A.: Efficient grasping from RGBD images: learning using a new rectangle representation. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3304–3311. IEEE (2011)
Depierre, A., Dellandréa, E., Chen, L.: Jacquard: a large scale dataset for robotic grasp detection. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3511–3516. IEEE (2018)
Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1316–1322. IEEE (2015)
Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 769–776. IEEE (2017)
Guo, D., Sun, F., Liu, H., Kong, T., Fang, B., Xi, N.: A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1609–1614. IEEE (2017)
Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. arXiv preprint arXiv:1804.05172 (2018)
Zhou, X., Lan, X., Zhang, H., Tian, Z., Zhang, Y., Zheng, N.: Fully convolutional grasp detection network with oriented anchor box. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7223–7230. IEEE (2018)
Cao, H., Chen, G., Li, Z., Lin, J., Knoll, A.: Residual squeeze-and-excitation network with multi-scale spatial pyramid module for fast robotic grasping detection. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13445–13451. IEEE (2021)
Ainetter, S., Fraundorfer, F.: End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from RGB. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13452–13458. IEEE (2021)
Xu, Y., Nagahara, H., Shimada, A., Taniguchi, R.: Transcut: transparent object segmentation from a light-field image. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3442–3450 (2015)
Chen, G., Han, K., Wong, K.-Y.K.: Tom-net: Learning transparent object matting from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9233–9241 (2018)
Xie, E., Wang, W., Wang, W., Ding, M., Shen, C., Luo, P.: Segmenting transparent objects in the wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 696–711. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_41
Xie, E., et al.: Trans2seg: transparent object segmentation with transformer (2021)
Kalra, A., Taamazyan, V., Rao, S.K., Venkataraman, K., Raskar, R., Kadambi, A.: Deep polarization cues for transparent object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8602–8611 (2020)
Lai, P.-J., Fuh, C.-S.: Transparent object detection using regions with convolutional neural network. In: IPPR Conference on Computer Vision, Graphics, and Image Processing, vol. 2 (2015)
Sajjan, S., et al.: Clear grasp: 3D shape estimation of transparent objects for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642. IEEE (2020)
Liu, X., Jonschkowski, R., Angelova, A., Konolige, K.: Keypose: multi-view 3D labeling and keypoint estimation for transparent objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11602–11610 (2020)
Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)
Vaswani, A., et al.: Attention is all you need. Advances in Neural Information Processing Systems, vol. 30 (2017)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
Wang, Y., et al.: End-to-end video instance segmentation with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8741–8750 (2021)
Zhang, H., Lan, X., Bai, S., Zhou, X., Tian, Z., Zheng, N.: Roi-based robotic grasp detection for object overlapping scenes. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4768–4775. IEEE (2019)
Song, Y., Gao, L., Li, X., Shen, W.: A novel robotic grasp detection method based on region proposal networks. Robot. Comput.-Integr. Manuf. 65, 101963 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, J., Liu, H., Xia, C. (2022). Transformer Based Feature Pyramid Network for Transparent Objects Grasp. In: Liu, H., et al. Intelligent Robotics and Applications. ICIRA 2022. Lecture Notes in Computer Science(), vol 13456. Springer, Cham. https://doi.org/10.1007/978-3-031-13822-5_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-13822-5_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13821-8
Online ISBN: 978-3-031-13822-5
eBook Packages: Computer ScienceComputer Science (R0)