Skip to main content
Log in

HFE-Net: hierarchical feature extraction and coordinate conversion of point cloud for object 6D pose estimation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The current challenging problems of learning a robust 6D pose lie in noise in RGB/RGBD images, sparsity of point cloud and severe occlusion. To tackle the problems, object geometric information is critical. In this work, we present a novel pipeline for 6DoF object pose estimation. Unlike previous methods that directly regressing pose parameters and predicting keypoints, we tackle this challenging task with a point-pair based approach and leverage geometric information as much as possible. Specifically, at the representation learning stage, we build a point cloud network locally modeling CNN to encode point cloud, which is able to extract effective geometric features while the point cloud is projected into a high-dimensional space. Moreover, we design a coordinate conversion network to regress point cloud in the object coordinate system in a decoded way. Then, the pose could be calculated through point pairs matching algorithm. Experimental results show that our method achieves state-of-the-art performance on several datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data/reanalysis that support the findings of this study are publicly available online at https://drive.google.com/file/d/1if4VoEXNx9W3XCn0Y7Fp15B4GpcYbyYi/view and https://drive.google.com/drive/folders/19ivHpaKm9dOrr12fzC8IDFczWRPFxho7.

References

  1. Marchand E, Uchiyama H, Spindler F (2015) Pose estimation for augmented reality: a hands-on survey. IEEE Trans Visual Comput Grap 22(12):2633–2651

    Article  Google Scholar 

  2. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915

  3. Xu D, Anguelov D, Jain A (2018) Pointfusion: deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 244–253

  4. Zhu M, Derpanis KG, Yang Y, Brahmbhatt S, Zhang M, Phillips C, Lecce M, Daniilidis K (2014) Single image 3D object detection and pose estimation for grasping. In: 2014 IEEE international conference on robotics and automation (ICRA), pp 3936–3943. IEEE

  5. Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790

  6. Rodrigues JJ, Kim J-S, Furukawa M, Xavier J, Aguiar P, Kanade T (2012) 6D pose estimation of textureless shiny objects using random ferns for bin-picking. In: 2012 IEEE/RSJ international conference on intelligent robots and systems, pp 3334–3341. IEEE

  7. Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6D object pose estimation using 3d object coordinates. In: European conference on computer vision. Springer, pp 536–551

  8. Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Asian conference on computer vision. Springer, pp 548–562

  9. Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199

  10. Do T-T, Cai M, Pham T, Reid I (2018) Deep-6dpose: recovering 6D object pose from a single RGB image. arXiv preprint arXiv:1802.10367

  11. Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6D: Making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529 (2017)

  12. Wang C, et al (2019) Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3343–3352

  13. Mo N, Gan W, Yokoya N, Chen S (2022) Es6d: a computation efficient and symmetry-aware 6D pose regression framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6718–6727

  14. Rad M, Lepetit V (2017) Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3836

  15. Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6D object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3385–3394

  16. Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 4561–4570

  17. He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641

  18. He Y, Huang H, Fan H, Chen Q, Sun J (2021) Ffb6d: a full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3003–3013

  19. Wu Y, Zand M, Etemad A, Greenspan M (2022) Vote from the center: 6 dof pose estimation in rgb-d images by radial keypoint voting. In: European conference on computer vision. Springer, pp 335–352

  20. Xu Z, Zhang Y, Chen K, Jia K (2022) Bico-net: regress globally, match locally for robust 6d pose estimation. arXiv preprint arXiv:2205.03536

  21. Li H, Lin J, Jia K (2022) Dcl-net: deep correspondence learning network for 6d pose estimation. In: European conference on computer vision. Springer, pp 369–385

  22. Trabelsi A, Chaabane M, Blanchard N, Beveridge R (2021) A pose proposal and refinement network for better 6d object pose estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2382–2391

  23. Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: European conference on computer vision. Springer, pp 404–417

  24. Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 1150–1157. IEEE

  25. Rothganger F, Lazebnik S, Schmid C, Ponce J (2006) 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. Int J Comput Vis 66(3):231–259

    Article  Google Scholar 

  26. Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 292–301

  27. Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946

  28. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499

  29. Oberweger M, Rad M, Lepetit V (2018) Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 119–134

  30. Suwajanakorn S, Snavely N, Tompson JJ, Norouzi M (2018) Discovery of latent 3d keypoints via end-to-end geometric reasoning. Adv Neur Inf Process Syst 31

  31. Krull A, Michel F, Brachmann E, Gumhold S, Ihrke S, Rother C (2014) 6-dof model based tracking via object coordinate regression. In: Asian Conference on Computer Vision. Springer, pp. 384–399

  32. Nigam A, Penate-Sanchez A, Agapito L (2018) Detect globally, label locally: learning accurate 6-dof object pose estimation by joint segmentation and coordinate regression. IEEE Robot Autom Lett 3(4):3960–3967

    Article  Google Scholar 

  33. Brachmann E, Michel F, Krull A, Yang MY, Gumhold S, et al. (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3364–3372

  34. Li Z, Wang G, Ji X (2019) Cdpn: Coordinates-based disentangled pose network for real-time RGB-based 6-dof object pose estimation. In: Proceedings of the IEEE/cvf international conference on computer vision, pp 7678–7687

  35. Park K, Patten T, Vincze M (2019) Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7668–7677

  36. Zakharov S, Shugurov I, Ilic S (2019) Dpod: 6d pose object detector and refiner. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1941–1950

  37. Hodan T, Barath, D, Matas, J (2020) Epos: estimating 6d pose of objects with symmetries. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11703–11712

  38. He K, et al (2017) Mask r-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  39. Lan S, Yu R, Yu G, Davis LS (2019) Modeling local geometric structure of 3d point clouds using geo-cnn. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 998–1008

  40. Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 998–1005 (2010). IEEE

  41. Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 International conference on computer vision, pp 858–865. IEEE

  42. Chen W, Jia X, Chang HJ, Duan J, Leonardis A (2020) G2l-net: global to local network for real-time 6d pose estimation with embedding vector features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4233–4242

Download references

Acknowledgements

This work was supported in part by the Foundation of National Natural Science Foundation of China under Grant 61973065, 52075531, the Fundamental Research Funds for the Central Universities of China under Grant N2104008, and the Central Government Guides the Local Science and Technology Development Special Fund: 2021JH6/10500129.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, Z., Chu, H., Wang, F. et al. HFE-Net: hierarchical feature extraction and coordinate conversion of point cloud for object 6D pose estimation. Neural Comput & Applic 36, 3167–3178 (2024). https://doi.org/10.1007/s00521-023-09241-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09241-1

Keywords

Navigation