HFE-Net: hierarchical feature extraction and coordinate conversion of point cloud for object 6D pose estimation

Shen, Ze; Chu, Hao; Wang, Fei; Guo, Yi; Liu, Shangdong; Han, Shuai

doi:10.1007/s00521-023-09241-1

HFE-Net: hierarchical feature extraction and coordinate conversion of point cloud for object 6D pose estimation

Original Article
Published: 30 November 2023

Volume 36, pages 3167–3178, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Ze Shen¹,
Hao Chu¹,
Fei Wang ORCID: orcid.org/0000-0001-8296-8039¹,
Yi Guo¹,
Shangdong Liu¹ &
…
Shuai Han²

204 Accesses
Explore all metrics

Abstract

The current challenging problems of learning a robust 6D pose lie in noise in RGB/RGBD images, sparsity of point cloud and severe occlusion. To tackle the problems, object geometric information is critical. In this work, we present a novel pipeline for 6DoF object pose estimation. Unlike previous methods that directly regressing pose parameters and predicting keypoints, we tackle this challenging task with a point-pair based approach and leverage geometric information as much as possible. Specifically, at the representation learning stage, we build a point cloud network locally modeling CNN to encode point cloud, which is able to extract effective geometric features while the point cloud is projected into a high-dimensional space. Moreover, we design a coordinate conversion network to regress point cloud in the object coordinate system in a decoded way. Then, the pose could be calculated through point pairs matching algorithm. Experimental results show that our method achieves state-of-the-art performance on several datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PFRNet: 3-D partial-to-full point cloud registration network for arbitrary pose matching

Article 14 December 2023

A Deep Learnable Framework for 3D Point Clouds Pose Transformation Regression

Generalizable and Accurate 6D Object Pose Estimation Network

Data availability

The data/reanalysis that support the findings of this study are publicly available online at https://drive.google.com/file/d/1if4VoEXNx9W3XCn0Y7Fp15B4GpcYbyYi/view and https://drive.google.com/drive/folders/19ivHpaKm9dOrr12fzC8IDFczWRPFxho7.

References

Marchand E, Uchiyama H, Spindler F (2015) Pose estimation for augmented reality: a hands-on survey. IEEE Trans Visual Comput Grap 22(12):2633–2651
Article Google Scholar
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
Xu D, Anguelov D, Jain A (2018) Pointfusion: deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 244–253
Zhu M, Derpanis KG, Yang Y, Brahmbhatt S, Zhang M, Phillips C, Lecce M, Daniilidis K (2014) Single image 3D object detection and pose estimation for grasping. In: 2014 IEEE international conference on robotics and automation (ICRA), pp 3936–3943. IEEE
Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790
Rodrigues JJ, Kim J-S, Furukawa M, Xavier J, Aguiar P, Kanade T (2012) 6D pose estimation of textureless shiny objects using random ferns for bin-picking. In: 2012 IEEE/RSJ international conference on intelligent robots and systems, pp 3334–3341. IEEE
Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6D object pose estimation using 3d object coordinates. In: European conference on computer vision. Springer, pp 536–551
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Asian conference on computer vision. Springer, pp 548–562
Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199
Do T-T, Cai M, Pham T, Reid I (2018) Deep-6dpose: recovering 6D object pose from a single RGB image. arXiv preprint arXiv:1802.10367
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6D: Making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529 (2017)
Wang C, et al (2019) Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3343–3352
Mo N, Gan W, Yokoya N, Chen S (2022) Es6d: a computation efficient and symmetry-aware 6D pose regression framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6718–6727
Rad M, Lepetit V (2017) Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3836
Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6D object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3385–3394
Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 4561–4570
He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641
He Y, Huang H, Fan H, Chen Q, Sun J (2021) Ffb6d: a full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3003–3013
Wu Y, Zand M, Etemad A, Greenspan M (2022) Vote from the center: 6 dof pose estimation in rgb-d images by radial keypoint voting. In: European conference on computer vision. Springer, pp 335–352
Xu Z, Zhang Y, Chen K, Jia K (2022) Bico-net: regress globally, match locally for robust 6d pose estimation. arXiv preprint arXiv:2205.03536
Li H, Lin J, Jia K (2022) Dcl-net: deep correspondence learning network for 6d pose estimation. In: European conference on computer vision. Springer, pp 369–385
Trabelsi A, Chaabane M, Blanchard N, Beveridge R (2021) A pose proposal and refinement network for better 6d object pose estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2382–2391
Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: European conference on computer vision. Springer, pp 404–417
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 1150–1157. IEEE
Rothganger F, Lazebnik S, Schmid C, Ponce J (2006) 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. Int J Comput Vis 66(3):231–259
Article Google Scholar
Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 292–301
Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499
Oberweger M, Rad M, Lepetit V (2018) Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 119–134
Suwajanakorn S, Snavely N, Tompson JJ, Norouzi M (2018) Discovery of latent 3d keypoints via end-to-end geometric reasoning. Adv Neur Inf Process Syst 31
Krull A, Michel F, Brachmann E, Gumhold S, Ihrke S, Rother C (2014) 6-dof model based tracking via object coordinate regression. In: Asian Conference on Computer Vision. Springer, pp. 384–399
Nigam A, Penate-Sanchez A, Agapito L (2018) Detect globally, label locally: learning accurate 6-dof object pose estimation by joint segmentation and coordinate regression. IEEE Robot Autom Lett 3(4):3960–3967
Article Google Scholar
Brachmann E, Michel F, Krull A, Yang MY, Gumhold S, et al. (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3364–3372
Li Z, Wang G, Ji X (2019) Cdpn: Coordinates-based disentangled pose network for real-time RGB-based 6-dof object pose estimation. In: Proceedings of the IEEE/cvf international conference on computer vision, pp 7678–7687
Park K, Patten T, Vincze M (2019) Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7668–7677
Zakharov S, Shugurov I, Ilic S (2019) Dpod: 6d pose object detector and refiner. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1941–1950
Hodan T, Barath, D, Matas, J (2020) Epos: estimating 6d pose of objects with symmetries. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11703–11712
He K, et al (2017) Mask r-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Lan S, Yu R, Yu G, Davis LS (2019) Modeling local geometric structure of 3d point clouds using geo-cnn. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 998–1008
Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 998–1005 (2010). IEEE
Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 International conference on computer vision, pp 858–865. IEEE
Chen W, Jia X, Chang HJ, Duan J, Leonardis A (2020) G2l-net: global to local network for real-time 6d pose estimation with embedding vector features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4233–4242

Download references

Acknowledgements

This work was supported in part by the Foundation of National Natural Science Foundation of China under Grant 61973065, 52075531, the Fundamental Research Funds for the Central Universities of China under Grant N2104008, and the Central Government Guides the Local Science and Technology Development Special Fund: 2021JH6/10500129.

Author information

Authors and Affiliations

Faculty of Robot Science and Engineering, Northeastern University, Shenyang, 110169, Liaoning, China
Ze Shen, Hao Chu, Fei Wang, Yi Guo & Shangdong Liu
Department of Neurosurgery, Shengjing Hospital of China Medical University, Shenyang, 110055, Liaoning, China
Shuai Han

Authors

Ze Shen
View author publications
You can also search for this author in PubMed Google Scholar
Hao Chu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Shangdong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shen, Z., Chu, H., Wang, F. et al. HFE-Net: hierarchical feature extraction and coordinate conversion of point cloud for object 6D pose estimation. Neural Comput & Applic 36, 3167–3178 (2024). https://doi.org/10.1007/s00521-023-09241-1

Download citation

Received: 06 March 2023
Accepted: 03 November 2023
Published: 30 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00521-023-09241-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HFE-Net: hierarchical feature extraction and coordinate conversion of point cloud for object 6D pose estimation

Abstract

Access this article

Similar content being viewed by others

PFRNet: 3-D partial-to-full point cloud registration network for arbitrary pose matching

A Deep Learnable Framework for 3D Point Clouds Pose Transformation Regression

Generalizable and Accurate 6D Object Pose Estimation Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HFE-Net: hierarchical feature extraction and coordinate conversion of point cloud for object 6D pose estimation

Abstract

Access this article

Similar content being viewed by others

PFRNet: 3-D partial-to-full point cloud registration network for arbitrary pose matching

A Deep Learnable Framework for 3D Point Clouds Pose Transformation Regression

Generalizable and Accurate 6D Object Pose Estimation Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation