Abstract
Transparent objects are ubiquitous in household settings and pose distinct challenges for visual sensing and perception systems. The optical properties of transparent objects leave conventional 3D sensors alone unreliable for object depth and pose estimation. These challenges are highlighted by the shortage of large-scale RGB-Depth datasets focusing on transparent objects in real-world settings. In this work, we contribute a large-scale real-world RGB-Depth transparent object dataset named ClearPose to serve as a benchmark dataset for segmentation, scene-level depth completion and object-centric pose estimation tasks. The ClearPose dataset contains over 350K labeled real-world RGB-Depth frames and 5M instance annotations covering 63 household objects. The dataset includes object categories commonly used in daily life under various lighting and occluding conditions as well as challenging test scenarios such as cases of occlusion by opaque or translucent objects, non-planar orientations, presence of liquids, etc. We benchmark several state-of-the-art depth completion and object pose estimation deep neural networks on ClearPose. The dataset and benchmarking source code is available at https://github.com/opipari/ClearPose.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans. Robot. 37(6), 1874–1890 (2021)
Chang, J., et al.: GhostPose:*: multi-view pose estimation of transparent objects for robot hand grasping. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5749–5755. IEEE (2021)
Chen, G., Han, K., Wong, K.Y.K.: Tom-net: learning transparent object matting from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9233–9241 (2018)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Chen, X., Zhang, H., Yu, Z., Lewis, S., Jenkins, O.C.: ProgressLabeller: visual data stream annotation for training object-centric 3d perception. arXiv preprint arXiv:2203.00283 (2022)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural Information Processing Systems, vol. 27 (2014)
Fang, H., Fang, H.S., Xu, S., Lu, C.: TransCG: a large-scale real-world dataset for transparent object depth completion and grasping. arXiv preprint arXiv:2202.08471 (2022)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, Y., Huang, H., Fan, H., Chen, Q., Sun, J.: Ffb6d: a full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3003–3013 (2021)
Hodaň, T., et al.: BOP challenge 2020 on 6d object localization. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 577–594. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_39
Hong, X., Xiong, P., Ji, R., Fan, H.: Deep fusion network for image completion. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2033–2042 (2019)
Liu, X., Iwase, S., Kitani, K.M.: Stereobj-1 m: large-scale stereo image dataset for 6d object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10870–10879 (2021)
Liu, X., Jonschkowski, R., Angelova, A., Konolige, K.: KeyPose: multi-view 3d labeling and keypoint estimation for transparent objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11602–11610 (2020)
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4561–4570 (2019)
Sajjan, S., et al.: Clear grasp: 3d shape estimation of transparent objects for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642. IEEE (2020)
Tang, Y., Chen, J., Yang, Z., Lin, Z., Li, Q., Liu, W.: DepthGrasp: depth completion of transparent objects using self-attentive adversarial network with spectral residual for grasping. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5710–5716. IEEE (2021)
Tian, M., Pan, L., Ang, M.H., Lee, G.H.: Robust 6d object pose estimation by learning rgb-d features. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 6218–6224. IEEE (2020)
Wang, C., et al.: DenseFusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3343–3352 (2019)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. Robot. Sci. Syst. (2018)
Xie, E., Wang, W., Wang, W., Ding, M., Shen, C., Luo, P.: Segmenting transparent objects in the wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 696–711. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_41
Xu, C., Chen, J., Yao, M., Zhou, J., Zhang, L., Liu, Y.: 6dof pose estimation of transparent object from a single RGB-D image. Sensors 20(23), 6790 (2020)
Xu, H., Wang, Y.R., Eppel, S., Aspuru-Guzik, A., Shkurti, F., Garg, A.: Seeing glass: joint point cloud and depth completion for transparent objects. arXiv preprint arXiv:2110.00087 (2021)
Xu, Y., Nagahara, H., Shimada, A., Taniguchi, R.i.: Transcut: transparent object segmentation from a light-field image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3442–3450 (2015)
Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-d image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 175–185 (2018)
Zhou, Z., Chen, X., Jenkins, O.C.: Lit: light-field inference of transparency for refractive object localization. IEEE Robot. Autom. Lett. 5(3), 4548–4555 (2020)
Zhu, L., et al.: RGB-d local implicit function for depth completion of transparent objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4649–4658 (2021)
Acknowledgement
We thank greatly the support from Dr. Peter Gaskell and Weishu Wu at the University of Michigan, who provided devices and objects for dataset collection.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, X., Zhang, H., Yu, Z., Opipari, A., Chadwicke Jenkins, O. (2022). ClearPose: Large-scale Transparent Object Dataset and Benchmark. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-20074-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20073-1
Online ISBN: 978-3-031-20074-8
eBook Packages: Computer ScienceComputer Science (R0)