Skip to main content

Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

In this paper, we present a generalizable model-free 6-DoF object pose estimator called Gen6D. Existing generalizable pose estimators either need the high-quality object models or require additional depth maps or object masks in test time, which significantly limits their application scope. In contrast, our pose estimator only requires some posed images of the unseen object and is able to accurately predict poses of the object in arbitrary environments. Gen6D consists of an object detector, a viewpoint selector and a pose refiner, all of which do not require the 3D object model and can generalize to unseen objects. Experiments show that Gen6D achieves state-of-the-art results on two model-free datasets: the MOPED dataset and a new GenMOP dataset. In addition, on the LINEMOD dataset, Gen6D achieves competitive results compared with instance-specific pose estimators. Project page: https://liuyuan-pal.github.io/Gen6D/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ammirato, P., Fu, C.Y., Shvets, M., Kosecka, J., Berg, A.C.: Target driven instance detection. arXiv preprint arXiv:1803.04610 (2018)

  2. Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.: Pose guided RGBD feature learning for 3D object pose estimation. In: ICCV (2017)

    Google Scholar 

  3. Banani, M.E., Corso, J.J., Fouhey, D.F.: Novel object viewpoint estimation through reconstruction alignment. In: CVPR (2020)

    Google Scholar 

  4. Busam, B., Jung, H.J., Navab, N.: I like to move it: 6D pose estimation as an action decision process. arXiv preprint arXiv:2009.12678 (2020)

  5. Cai, D., Heikkilä, J., Rahtu, E.: OVE6D: object viewpoint encoding for depth-based 6D object pose estimation. In: CVPR (2022)

    Google Scholar 

  6. Cai, M., Reid, I.: Reconstruct locally, localize globally: a model free method for object pose estimation. In: CVPR (2020)

    Google Scholar 

  7. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)

  8. Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6D object pose and size estimation. In: CVPR (2020)

    Google Scholar 

  9. Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6D object pose estimation. In: ICCV (2021)

    Google Scholar 

  10. Chen, W., Jia, X., Chang, H.J., Duan, J., Shen, L., Leonardis, A.: FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In: CVPR (2021)

    Google Scholar 

  11. Chen, X., Dong, Z., Song, J., Geiger, A., Hilliges, O.: Category level object pose estimation via neural analysis-by-synthesis. In: ECCV (2020)

    Google Scholar 

  12. Dani, M., Narain, K., Hebbalaguppe, R.: 3DPoselite: a compact 3D pose estimation using node embeddings. In: WACV (2021)

    Google Scholar 

  13. Deng, X., Geng, J., Bretl, T., Xiang, Y., Fox, D.: iCaps: iterative category-level object pose and shape estimation. IEEE Robot. Autom. Lett. 7, 1784–1791 (2022)

    Article  Google Scholar 

  14. Di, Y., Manhardt, F., Wang, G., Ji, X., Navab, N., Tombari, F.: So-Pose: exploiting self-occlusion for direct 6D pose estimation. In: CVPR (2021)

    Google Scholar 

  15. Di, Y., et al.: GPV-Pose: category-level object pose estimation via geometry-guided point-wise voting. arXiv preprint arXiv:2203.07918 (2022)

  16. Goodwin, W., Vaze, S., Havoutis, I., Posner, I.: Zero-shot category-level object pose estimation. arXiv preprint arXiv:2204.03635 (2022)

  17. Gou, M., Pan, H., Fang, H.S., Liu, Z., Lu, C., Tan, P.: Unseen object 6D pose estimation: a benchmark and baselines. arXiv preprint arXiv:2206.11808 (2022)

  18. Grabner, A., et al.: Geometric correspondence fields: learned differentiable rendering for 3D pose refinement in the wild. In: ECCV (2020)

    Google Scholar 

  19. Gu, Q., Okorn, B., Held, D.: OSSID: online self-supervised instance detection by (and for) pose estimation. IEEE Robot. Autom. Lett. 7, 3022–3029 (2022)

    Article  Google Scholar 

  20. He, Y., Wang, Y., Fan, H., Sun, J., Chen, Q.: FS6D: few-shot 6D pose estimation of novel objects. In: CVPR (2022)

    Google Scholar 

  21. Hinterstoisser, S., et al.: Gradient response maps for real-time detection of texture-less objects. T-PAMI 34(5), 876–888 (2011)

    Article  Google Scholar 

  22. Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: ICCV (2011)

    Google Scholar 

  23. Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV (2012)

    Google Scholar 

  24. Hodan, T., Barath, D., Matas, J.: EPOS: estimating 6D pose of objects with symmetries. In: CVPR (2020)

    Google Scholar 

  25. Hodan, T., et al.: BOP: benchmark for 6D object pose estimation. In: ECCV (2018)

    Google Scholar 

  26. Hodaň, T., et al.: Bop challenge 2020 on 6D object localization. In: ECCV (2020)

    Google Scholar 

  27. Hu, Y., Fua, P., Wang, W., Salzmann, M.: Single-stage 6D object pose estimation. In: CVPR (2020)

    Google Scholar 

  28. Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: ECCV (2020)

    Google Scholar 

  29. Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: ECCV (2018)

    Google Scholar 

  30. Lin, J., Li, H., Chen, K., Lu, J., Jia, K.: Sparse steerable convolutions: an efficient learning of se (3)-equivariant features for estimation and tracking of object poses in 3D space. NeurIPS (2021)

    Google Scholar 

  31. Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: DualPoseNet: category-level 6D object pose and size estimation using dual pose network with refined learning of pose consistency. In: ICCV (2021)

    Google Scholar 

  32. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  33. Liu, X., Iwase, S., Kitani, K.M.: StereOBJ-1M: large-scale stereo image dataset for 6D object pose estimation. In: CVPR (2021)

    Google Scholar 

  34. Liu, X., Jonschkowski, R., Angelova, A., Konolige, K.: KeyPose: multi-view 3D labeling and keypoint estimation for transparent objects. In: CVPR (2020)

    Google Scholar 

  35. Mercier, J.P., Garon, M., Giguere, P., Lalonde, J.F.: Deep template-based object instance detection. In: WACV (2021)

    Google Scholar 

  36. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)

    Google Scholar 

  37. Nguyen, V.N., Hu, Y., Xiao, Y., Salzmann, M., Lepetit, V.: Templates for 3D object pose estimation revisited: generalization to new objects and robustness to occlusions. In: CVPR (2022)

    Google Scholar 

  38. Okorn, B., Gu, Q., Hebert, M., Held, D.: ZePHyR: zero-shot pose hypothesis rating. In: ICRA (2021)

    Google Scholar 

  39. Osokin, A., Sumin, D., Lomakin, V.: OS2D: one-stage one-shot object detection by matching anchor features. In: ECCV (2020)

    Google Scholar 

  40. Park, J., Cho, N.I.: DProST: 6-DoF object pose estimation using space carving and dynamic projective spatial transformer. arXiv preprint arXiv:2112.08775 (2021)

  41. Park, K., Mousavian, A., Xiang, Y., Fox, D.: LatentFusion: end-to-end differentiable reconstruction and rendering for unseen object pose estimation. In: CVPR (2020)

    Google Scholar 

  42. Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6-DoF pose estimation. In: CVPR (2019)

    Google Scholar 

  43. Pitteri, G., Bugeau, A., Ilic, S., Lepetit, V.: 3D object detection and pose estimation of unseen objects in color images with local surface embeddings. In: ACCV (2020)

    Google Scholar 

  44. Pitteri, G., Ilic, S., Lepetit, V.: CorNet: generic 3D corners for 6D pose estimation of new objects without retraining. In: ICCVW (2019)

    Google Scholar 

  45. Pitteri, G., Ramamonjisoa, M., Ilic, S., Lepetit, V.: On object symmetries and 6D pose estimation from images. In: 3DV (2019)

    Google Scholar 

  46. Ponimatkin, G., Labbé, Y., Russell, B., Aubry, M., Sivic, J.: Focal length and object pose estimation via render and compare. In: CVPR (2022)

    Google Scholar 

  47. Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: CVPR (2017)

    Google Scholar 

  48. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  49. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  50. Shugurov, I., Li, F., Busam, B., Ilic, S.: OSOP: a multi-stage one shot object pose estimation framework. In: CVPR (2022)

    Google Scholar 

  51. Simeonov, A., et al.: Neural descriptor fields: Se (3)-equivariant object representations for manipulation. arXiv preprint arXiv:2112.05124 (2021)

  52. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  53. Song, C., Song, J., Huang, Q.: HybridPose: 6D object pose estimation under hybrid representations. In: CVPR (2020)

    Google Scholar 

  54. Su, Y., et al.: ZebraPose: coarse to fine surface encoding for 6DoF object pose estimation. In: CVPR (2022)

    Google Scholar 

  55. Sun, J., et al.: OnePose: one-shot object pose estimation without CAD models. CVPR (2022)

    Google Scholar 

  56. Sundermeyer, M., et al.: Multi-path learning for object pose estimation across domains. In: CVPR (2020)

    Google Scholar 

  57. Sundermeyer, M., Marton, Z.C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: ECCV (2018)

    Google Scholar 

  58. Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR (2018)

    Google Scholar 

  59. Tian, M., Ang, M.H., Lee, G.H.: Shape prior deformation for categorical 6D object pose and size estimation. In: ECCV (2020)

    Google Scholar 

  60. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  61. Wang, G., Manhardt, F., Shao, J., Ji, X., Navab, N., Tombari, F.: Self6D: self-supervised monocular 6D object pose estimation. In: ECCV (2020)

    Google Scholar 

  62. Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: geometry-guided direct regression network for monocular 6D object pose estimation. In: CVPR (2021)

    Google Scholar 

  63. Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: CVPR (2019)

    Google Scholar 

  64. Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: CVPR (2021)

    Google Scholar 

  65. Wen, B., Bekris, K.: BundleTrack: 6D pose tracking for novel objects without instance or category-level 3D models. In: IROS (2021)

    Google Scholar 

  66. Wen, Y., et al.: Disentangled implicit shape and pose learning for scalable 6D pose estimation. arXiv preprint arXiv:2107.12549 (2021)

  67. Wen, Y., Pan, H., Yang, L., Wang, W.: Edge enhanced implicit orientation learning with geometric prior for 6D pose estimation. In: IROS (2020)

    Google Scholar 

  68. Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015)

    Google Scholar 

  69. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robot.: Sci. Syst. (2018)

    Google Scholar 

  70. Xiao, Y., Marlet, R.: Few-shot object detection and viewpoint estimation for objects in the wild. In: ECCV (2020)

    Google Scholar 

  71. Xiao, Y., Qiu, X., Langlois, P.A., Aubry, M., Marlet, R.: Pose from Shape: deep pose estimation for arbitrary 3D objects. In: BMVC (2019)

    Google Scholar 

  72. Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: INeRF: inverting neural radiance fields for pose estimation. In: IROS (2021)

    Google Scholar 

  73. Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: ICCV (2019)

    Google Scholar 

  74. Zhao, C., Hu, Y., Salzmann, M.: Fusing local similarities for retrieval-based 3D orientation estimation of unseen objects. arXiv preprint arXiv:2203.08472 (2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenping Wang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3734 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y. et al. (2022). Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19824-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19823-6

  • Online ISBN: 978-3-031-19824-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics