Focal segmentation for robust 6D object pose estimation

Ye, Yuning; Park, Hanhoon

doi:10.1007/s11042-023-16937-y

Focal segmentation for robust 6D object pose estimation

Published: 27 October 2023

Volume 83, pages 47563–47585, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

159 Accesses
Explore all metrics

Abstract

In the field of augmented reality, 6D pose estimation of rigid objects poses limitations and challenges. Most of the previous 6D pose estimation methods have trained deep neural networks to directly regress poses from input images or predict the 2D locations of 3D keypoints for pose estimation; thus, they are vulnerable to large occlusion. This study addresses the challenge of 6D pose estimation from a single RGB image under severe occlusion. A novel method is proposed that is based on PVNet but improves its performance. Similar to PVNet, our method regresses target object segments and pixel-wise direction vectors from an RGB image. Subsequently, the 2D locations of 3D keypoints are computed using the direction vectors of object pixels, and the 6D object pose is obtained using a PnP algorithm. However, accurate segmentation of object pixels is difficult, particularly under severe occlusion. To this end, a focal segmentation mechanism is proposed that ensures accurate complete segmentation of occluded objects. Extensive experiments on LINEMOD, LINEMOD-Occlusion datasets validate the effectiveness and superiority of our method. Our method improves the accuracy of PVNet by 1.09 and 5.14 on average in terms of the 2D reprojection error and ADD metric, respectively, without increasing the computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point

Article 22 May 2024

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Article 23 March 2023

Data Availability

The data that support the findings of this study are publicly available in the online repository: https://bop.felk.cvut.cz/datasets/.

References

Drummond T, Cipolla R (2002) Real-time visual tracking of complex structures. IEEE Trans Pattern Anal Mach Intell 24(7):932–946. https://doi.org/10.1109/TPAMI.2002.1017620
Article Google Scholar
Gu R, Wang G, Hwang J-n (2019) Efficient multi-person hierarchical 3D pose estimation for autonomous driving. In: 2019 IEEE conference on multimedia information processing and retrieval (MIPR), pp 163–168. https://doi.org/10.1109/MIPR.2019.00036
Kothari N, Gupta M, Vachhani L, Arya H (2017) Pose estimation for an autonomous vehicle using monocular vision. In: 2017 Indian control conference (ICC), pp 424–431. https://doi.org/10.1109/INDIANCC.2017.7846512
Zhang S, Song C, Radkowski R (2019) Setforge - synthetic RGB-D training data generation to support CNN-based pose estimation for augmented reality. In: 2019 IEEE international symposium on mixed and augmented reality adjunct (ISMAR-Adjunct), pp 237–242. https://doi.org/10.1109/ISMAR-Adjunct.2019.00-39
Lu Y, Kourian S, Salvaggio C, Xu C, Lu G (2019) Single image 3D vehicle pose estimation for augmented reality. In: 2019 IEEE global conference on signal and information processing (GlobalSIP), pp 1–5. https://doi.org/10.1109/GlobalSIP45357.2019.8969201
Hachiuma R, Saito H (2016) Recognition and pose estimation of primitive shapes from depth images for spatial augmented reality. In: 2016 IEEE 2nd workshop on everyday virtual reality (WEVR), pp 32–35. https://doi.org/10.1109/WEVR.2016.7859541
Li X, Ling H (2020) Hybrid camera pose estimation with online partitioning for SLAM. IEEE Robot Autom Lett 5:1453–1460. https://doi.org/10.1109/LRA.2020.2967688
Article Google Scholar
Ruan X, Wang F, Huang J (2019) Relative pose estimation of visual SLAM based on convolutional neural networks. In: 2019 Chinese control conference (CCC), pp 8827–8832. https://doi.org/10.23919/ChiCC.2019.8870974
Xiao Z, Wang X, Wang J, Wu Z (2017) Monocular ORB SLAM based on initialization by marker pose estimation. In: 2017 IEEE international conference on information and automation (ICIA), pp 678–682. https://doi.org/10.1109/ICInfA.2017.8078992
Malyavej V, Torteeka P, Wongkharn S, Wiangtong T (2009) Pose estimation of unmanned ground vehicle based on dead-reckoning/GPS sensor fusion by unscented Kalman filter. In: 2009 6th International conference on electrical engineering/electronics, computer, telecommunications and information technology, vol 01. pp 395–398. https://doi.org/10.1109/ECTICON.2009.5137033
Wang C, Xu D, Zhu Y, Martin-Martin R, Lu C, Fei-Fei L, Savarese S (2019) DenseFusion: 6D object pose estimation by iterative dense fusion. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3338–3347. https://doi.org/10.1109/CVPR.2019.00346
Park K, Patten T, Vincze M (2019) Pix2Pose: Pixel-wise coordinate regression of objects for 6D pose estimation. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 7667–7676. https://doi.org/10.1109/ICCV.2019.00776
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) SSD-6D: Making rgb-based 3D detection and 6D pose estimation great again. In: 2017 IEEE international conference on computer vision (ICCV), pp 1530–1538. https://doi.org/10.1109/ICCV.2017.169
Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement. arXiv:1804.02767
Rad M, Lepetit V (2017) BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: 2017 IEEE international conference on computer vision (ICCV), pp 3848–3856. https://doi.org/10.1109/ICCV.2017.413
Kendall A, Grimes M, Cipolla R (2015) PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In: 2015 IEEE international conference on computer vision (ICCV), pp 2938–2946. https://doi.org/10.1109/ICCV.2015.336
Xiang Y, Schmidt T, Narayanan V, Fox D (2018) PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv:1711.00199
Peng S, Zhou X, Liu Y, Lin H, Huang Q, Bao H (2022) PVNet: Pixel-wise voting network for 6DoF object pose estimation. IEEE Trans Pattern Anal Mach Intell 44(06):3212–3223. https://doi.org/10.1109/TPAMI.2020.3047388
Article Google Scholar
Lin T, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(02):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Article Google Scholar
Doumanoglou A, Kouskouridas R, Malassiotis S, Kim T-K (2016) Recovering 6D object pose and predicting next-best-view in the crowd. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3583–3592. https://doi.org/10.1109/CVPR.2016.390
Hinterstoisser S, Lepetit V, Rajkumar N, Konolige K (2016) Going further with point pair features. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016, pp 834–848
Tekin B, Sinha S, Fua P (2018) Real-time seamless single shot 6D object pose prediction. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 292–301. https://doi.org/10.1109/CVPR.2018.00038
Zhao Z, Peng G, Wang H, Fang H-S, Li C, Lu C (2018) Estimating 6D pose from localizing designated surface keypoints. https://doi.org/10.48550/ARXIV.1812.01387
Zakharov S, Shugurov I, Ilic S (2019) DPOD: 6D pose object detector and refiner. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 1941–1950. https://doi.org/10.1109/ICCV.2019.00203
Song C, Song J, Huang Q (2020) HybridPose: 6D object pose estimation under hybrid representations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 428–437. https://doi.org/10.1109/CVPR42600.2020.00051
Haugaard R, Buch A (2022) SurfEmb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6739–6748. https://doi.org/10.1109/CVPR52688.2022.00663
Do T-T, Cai M, Pham T, Reid I (2018) Deep-6DPose: Recovering 6D object pose from a single RGB image. https://doi.org/10.48550/arXiv.1802.10367
Wang G, Manhardt F, Tombari F, Ji X (2021) GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16606–16616. https://doi.org/10.1109/CVPR46437.2021.01634
Pavlakos G, Zhou X, Chan A, Derpanis KG, Daniilidis K (2017) 6-DoF object pose from semantic keypoints. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 2011–2018. https://doi.org/10.1109/ICRA.2017.7989233
Oberweger M, Rad M, Lepetit V (2018) Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018, pp 125–141
Lepetit V, Moreno-Noguer F, Fua P (2009) EPnP: An accurate O(n) solution to the PnP problem. Int J Comput Vis 81(2):155–166. https://doi.org/10.1007/s11263-008-0152-6
Article Google Scholar
Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
https://github.com/zju3dv/pvnet. Accessed 12 Dec 2022
Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P, Lepetit V (2012) Gradient response maps for real-time detection of textureless objects. IEEE Trans Pattern Anal Mach Intell 34(5):876–888. https://doi.org/10.1109/TPAMI.2011.206
Article Google Scholar
Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6D object pose estimation using 3D object coordinates. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV 2014, pp 536–551
Brachmann E, Michel F, Krull A, Yang MY, Gumhold S, Rother C (2016) Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3364–3372. https://doi.org/10.1109/CVPR.2016.366
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2013) Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee KM, Matsushita Y, Rehg JM, Hu Z (eds) Computer vision – ACCV 2012, pp 548–562
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) DetNet: Design backbone for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018, pp 339–354

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) Grant by the Korean Government through the MSIT under Grant 2021R1F1A1045749.

Author information

Authors and Affiliations

Department of Artificial Intelligence Convergence, Graduate School, Pukyong National University, Busan, 48513, Republic of Korea
Yuning Ye & Hanhoon Park
Division of Electronics and Communications Engineering, Pukyong National University, Busan, 48513, Republic of Korea
Hanhoon Park

Authors

Yuning Ye
View author publications
You can also search for this author in PubMed Google Scholar
Hanhoon Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanhoon Park.

Ethics declarations

Conflicts of interest

We have no conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ye, Y., Park, H. Focal segmentation for robust 6D object pose estimation. Multimed Tools Appl 83, 47563–47585 (2024). https://doi.org/10.1007/s11042-023-16937-y

Download citation

Received: 19 January 2023
Revised: 18 May 2023
Accepted: 11 September 2023
Published: 27 October 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-16937-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Focal segmentation for robust 6D object pose estimation

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Focal segmentation for robust 6D object pose estimation

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation