Abstract
While category-level 9DoF object pose estimation has emerged recently, previous correspondence-based or direct regression methods are both limited in accuracy due to the huge intra-category variances in object shape and color, etc. Orthogonal to them, this work presents a category-level object pose and size refiner CATRE, which is able to iteratively enhance pose estimate from point clouds to produce accurate results. Given an initial pose estimate, CATRE predicts a relative transformation between the initial pose and ground truth by means of aligning the partially observed point cloud and an abstract shape prior. In specific, we propose a novel disentangled architecture being aware of the inherent distinctions between rotation and translation/size estimation. Extensive experiments show that our approach remarkably outperforms state-of-the-art methods on REAL275, CAMERA25, and LM benchmarks up to a speed of \({\approx }{85.32}\,{\text {Hz}}\), and achieves competitive results on category-level tracking. We further demonstrate that CATRE can perform pose refinement on unseen category. Code and trained models are available (https://github.com/THU-DA-6D-Pose-Group/CATRE.git).
X. Liu and G. Wang—Equal contribution.
This is a preview of subscription content, access via your institution.
Buying options


References
Aoki, Y., Goforth, H., Srivatsan, R.A., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7163–7172 (2019)
Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 14(2), 239–256 (1992)
Bouaziz, S., Tagliasacchi, A., Pauly, M.: Sparse iterative closest point. In: Computer Graphics Forum, vol. 32, pp. 113–123. Wiley Online Library (2013)
Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3364–3372 (2016)
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6D object pose and size estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11970–11979 (2020). https://doi.org/10.1109/CVPR42600.2020.01199
Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6D object pose estimation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2773–2782 (2021)
Chen, W., Jia, X., Chang, H.J., Duan, J., Leonardis, A.: G2L-Net: global to local network for real-time 6D pose estimation with embedding vector features. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4233–4242 (2020)
Chen, W., Jia, X., Chang, H.J., Duan, J., Linlin, S., Leonardis, A.: FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1581–1590, June 2021
Choy, C., Dong, W., Koltun, V.: Deep global registration. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2514–2523 (2020)
Collins, J., et al.: ABO: Dataset and benchmarks for real-world 3d object understanding. arXiv preprint arXiv:2110.06199 (2021)
Deng, X., Geng, J., Bretl, T., Xiang, Y., Fox, D.: iCaps: iterative category-level object pose and shape estimation. IEEE Robot. Autom. Lett. (RAL) 7, 1784–1791 (2022)
Du, G., Wang, K., Lian, S., Zhao, K.: Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review. Artif. Intell. Rev. 54(3), 1677–1734 (2021)
Fan, Z., et al.: ACR-Pose: Adversarial canonical representation reconstruction network for category level 6d object pose estimation. arXiv preprint arXiv:2111.10524 (2021)
Gao, G., Lauri, M., Hu, X., Zhang, J., Frintrop, S.: CloudAAE: learning 6D object pose regression with on-line data synthesis on point clouds. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 11081–11087 (2021)
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: 3D-CODED: 3D correspondences by deep deformation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 235–251. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_15
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016)
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Asian Conference on Computer Vision (ACCV) (2012)
Hodaň, T., Matas, J., Obdržálek, Š: On evaluation of 6D object pose estimation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 606–619. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_52
Hodaň, T., et al.: bop challenge 2020 on 6D object localization. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 577–594. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_39
Huang, S., Qi, S., Xiao, Y., Zhu, Y., Wu, Y.N., Zhu, S.C.: Cooperative holistic scene understanding: unifying 3D object, layout, and camera pose estimation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 31 (2018)
Huynh, D.Q.: Metrics for 3D rotations: comparison and analysis. J. Math. Imag. Vis. 35(2), 155–164 (2009)
Ilya Loshchilov, F.H.: SGDR: stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) (2017)
Iwase, S., Liu, X., Khirodkar, R., Yokota, R., Kitani, K.M.: RePOSE: fast 6D object pose refinement via deep texture rendering. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3303–3312 (2021)
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1521–1529 (2017)
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_34
Lee, D., Hamsici, O.C., Feng, S., Sharma, P., Gernoth, T.: DeepPRO: deep partial point cloud registration of objects. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5683–5692 (2021)
Li, Y., Wang, G., Ji, X., Xiang, Yu., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 695–711. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_42
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. Int. J. Comput. Vis. (IJCV) 128(3), 657–678 (2020)
Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7678–7687 (2019)
Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: DualPoseNet: category-level 6D object pose and size estimation using dual pose network with refined learning of pose consistency. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3560–3569, October 2021
Lin, Z.H., Huang, S.Y., Wang, Y.C.F.: Convolution in the cloud: learning deformable kernels in 3D graph convolution networks for point cloud analysis. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1809 (2020)
Liu, L., et al.: On the variance of the adaptive learning rate and beyond. In: International Conference on Learning Representations (ICLR) (2019)
Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. (TVCG) 22(12), 2633–2651 (2015)
Nie, Y., Han, X., Guo, S., Zheng, Y., Chang, J., Zhang, J.J.: Total3DUnderstanding: joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 55–64 (2020)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 8026–8037 (2019)
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6dof pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4561–4570 (2019)
Peng, W., Yan, J., Wen, H., Sun, Y.: Self-supervised category-level 6D object pose estimation with deep implicit shape representation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 36, no. 2, pp. 2082–2090 (2022). https://doi.org/10.1609/aaai.v36i2.20104
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, no. 2, p. 4 (2017)
Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: Proceedings third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152. IEEE (2001)
Sarode, V., et al.: PCRNet: point cloud registration network using pointnet encoding. arXiv preprint arXiv:1908.07906 (2019)
Segal, A., Haehnel, D., Thrun, S.: Generalized-ICP. In: Robotics: Science and Systems, Seattle, WA, vol. 2, p. 435 (2009)
Song, C., Song, J., Huang, Q.: HybridPose: 6D object pose estimation under hybrid representations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 431–440 (2020)
Su, Y., Rambach, J., Minaskan, N., Lesur, P., Pagani, A., Stricker, D.: Deep multi-state object pose estimation for augmented reality assembly. In: 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 222–227 (2019)
Tian, M., Ang, M.H., Lee, G.H.: Shape prior deformation for categorical 6D object pose and size estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 530–546. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_32
Trappolini, G., Cosmo, L., Moschella, L., Marin, R., Melzi, S., Rodolà, E.: Shape registration in the time of transformers. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 5731–5744 (2021)
Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. In: Conference on Robot Learning (CoRL), pp. 306–316 (2018)
Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 13(04), 376–380 (1991). https://doi.org/10.1109/34.88573
Wang, C., et al.: 6-PACK: category-level 6D pose tracker with anchor-based keypoints. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 10059–10066 (2020)
Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3343–3352 (2019)
Wang, G., Manhardt, F., Liu, X., Ji, X., Tombari, F.: Occlusion-aware self-supervised monocular 6D object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2021). https://doi.org/10.1109/TPAMI.2021.3136301
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16611–16621 (2021)
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651 (2019)
Wang, J., Chen, K., Dou, Q.: Category-level 6D object pose estimation via cascaded relation and recurrent reconstruction networks. In: IEEE/RJS International Conference on Intelligent Robots and Systems (IROS) (2021)
Wang, Y., Solomon, J.: PRNet: self-supervised learning for partial-to-partial registration. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 8814–8826 (2019)
Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3523–3532 (2019)
Wen, B., Mitash, C., Ren, B., Bekris, K.E.: se(3)-TrackNet: data-driven 6D pose tracking by calibrating image residuals in synthetic domains. In: IEEE/RJS International Conference on Intelligent Robots and Systems (IROS), pp. 10367–10373 (2020)
Weng, Y., et al: CAPTRA: category-level pose tracking for rigid and articulated objects from point clouds. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13209–13218 (2021)
Wu, Y., He, K.: Group normalization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_1
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: Robotics: Science and Systems Conference (RSS) (2018)
Yong, H., Huang, J., Hua, X., Zhang, L.: Gradient centralization: a new optimization technique for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 635–652. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_37
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: dense 6D pose object detector in RGB images. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Zhang, M., Lucas, J., Ba, J., Hinton, G.E.: Lookahead optimizer: k steps forward, 1 step back. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NeurIPS), vol. 32. Curran Associates, Inc. (2019)
Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5745–5753 (2019)
Acknowledgments
We thank Yansong Tang at Tsinghua-Berkeley Shenzhen Institute, Ruida Zhang and Haotian Xu at Tsinghua University for their helpful suggestions. This work was supported by the National Key R &D Program of China under Grant 2018AAA0102801 and National Natural Science Foundation of China under Grant 61620106005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X., Wang, G., Li, Y., Ji, X. (2022). CATRE: Iterative Point Clouds Alignment for Category-Level Object Pose Refinement. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13662. Springer, Cham. https://doi.org/10.1007/978-3-031-20086-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-20086-1_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20085-4
Online ISBN: 978-3-031-20086-1
eBook Packages: Computer ScienceComputer Science (R0)