Skip to main content

Transformer Based Feature Pyramid Network for Transparent Objects Grasp

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13456))

Included in the following conference series:

  • 2405 Accesses

Abstract

Transparent objects like glass bottles and plastic cups are common in daily life, while few works show good performance on grasping transparent objects due to their unique optic properties. Besides the difficulties of this task, there is no dataset for transparent objects grasp. To address this problem, we propose an efficient dataset construction pipeline to label grasp pose for transparent objects. With Blender physics engines, our pipeline could generate numerous photo-realistic images and label grasp poses in a short time. We also propose TTG-Net - a transformer-based feature pyramid network for generating planar grasp pose, which utilizes features pyramid network with residual module to extract features and use transformer encoder to refine features for better global information. TTG-Net is fully trained on the virtual dataset generated by our pipeline and it shows 80.4% validation accuracy on the virtual dataset. To prove the effectiveness of TTG-Net on real-world data, we also test TTG-Net with photos randomly captured in our lab. TTG-Net shows 73.4% accuracy on real-world benchmark which shows remarkable sim2real generalization. We also evaluate other main-stream methods on our dataset, TTG-Net shows better generalization ability.

Supported by the Shenzhen Science Fund for Distinguished Young Scholars (RCJC20210706091946001) and the Guangdong Special Branch Plan for Young Talent with Scientific and Technological Innovation (2019TQ05Z111).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yun, J., Moseson, S., Saxena, A.: Efficient grasping from RGBD images: learning using a new rectangle representation. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3304–3311. IEEE (2011)

    Google Scholar 

  2. Depierre, A., Dellandréa, E., Chen, L.: Jacquard: a large scale dataset for robotic grasp detection. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3511–3516. IEEE (2018)

    Google Scholar 

  3. Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1316–1322. IEEE (2015)

    Google Scholar 

  4. Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 769–776. IEEE (2017)

    Google Scholar 

  5. Guo, D., Sun, F., Liu, H., Kong, T., Fang, B., Xi, N.: A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1609–1614. IEEE (2017)

    Google Scholar 

  6. Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. arXiv preprint arXiv:1804.05172 (2018)

  7. Zhou, X., Lan, X., Zhang, H., Tian, Z., Zhang, Y., Zheng, N.: Fully convolutional grasp detection network with oriented anchor box. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7223–7230. IEEE (2018)

    Google Scholar 

  8. Cao, H., Chen, G., Li, Z., Lin, J., Knoll, A.: Residual squeeze-and-excitation network with multi-scale spatial pyramid module for fast robotic grasping detection. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13445–13451. IEEE (2021)

    Google Scholar 

  9. Ainetter, S., Fraundorfer, F.: End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from RGB. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13452–13458. IEEE (2021)

    Google Scholar 

  10. Xu, Y., Nagahara, H., Shimada, A., Taniguchi, R.: Transcut: transparent object segmentation from a light-field image. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3442–3450 (2015)

    Google Scholar 

  11. Chen, G., Han, K., Wong, K.-Y.K.: Tom-net: Learning transparent object matting from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9233–9241 (2018)

    Google Scholar 

  12. Xie, E., Wang, W., Wang, W., Ding, M., Shen, C., Luo, P.: Segmenting transparent objects in the wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 696–711. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_41

    Chapter  Google Scholar 

  13. Xie, E., et al.: Trans2seg: transparent object segmentation with transformer (2021)

    Google Scholar 

  14. Kalra, A., Taamazyan, V., Rao, S.K., Venkataraman, K., Raskar, R., Kadambi, A.: Deep polarization cues for transparent object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8602–8611 (2020)

    Google Scholar 

  15. Lai, P.-J., Fuh, C.-S.: Transparent object detection using regions with convolutional neural network. In: IPPR Conference on Computer Vision, Graphics, and Image Processing, vol. 2 (2015)

    Google Scholar 

  16. Sajjan, S., et al.: Clear grasp: 3D shape estimation of transparent objects for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642. IEEE (2020)

    Google Scholar 

  17. Liu, X., Jonschkowski, R., Angelova, A., Konolige, K.: Keypose: multi-view 3D labeling and keypoint estimation for transparent objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11602–11610 (2020)

    Google Scholar 

  18. Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)

  19. Vaswani, A., et al.: Attention is all you need. Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  20. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  21. Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)

    Google Scholar 

  22. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

  23. Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)

    Google Scholar 

  24. Wang, Y., et al.: End-to-end video instance segmentation with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8741–8750 (2021)

    Google Scholar 

  25. Zhang, H., Lan, X., Bai, S., Zhou, X., Tian, Z., Zheng, N.: Roi-based robotic grasp detection for object overlapping scenes. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4768–4775. IEEE (2019)

    Google Scholar 

  26. Song, Y., Gao, L., Li, X., Shen, W.: A novel robotic grasp detection method based on region proposal networks. Robot. Comput.-Integr. Manuf. 65, 101963 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Houde Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J., Liu, H., Xia, C. (2022). Transformer Based Feature Pyramid Network for Transparent Objects Grasp. In: Liu, H., et al. Intelligent Robotics and Applications. ICIRA 2022. Lecture Notes in Computer Science(), vol 13456. Springer, Cham. https://doi.org/10.1007/978-3-031-13822-5_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13822-5_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13821-8

  • Online ISBN: 978-3-031-13822-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics