Abstract
In the field of augmented reality, 6D pose estimation of rigid objects poses limitations and challenges. Most of the previous 6D pose estimation methods have trained deep neural networks to directly regress poses from input images or predict the 2D locations of 3D keypoints for pose estimation; thus, they are vulnerable to large occlusion. This study addresses the challenge of 6D pose estimation from a single RGB image under severe occlusion. A novel method is proposed that is based on PVNet but improves its performance. Similar to PVNet, our method regresses target object segments and pixel-wise direction vectors from an RGB image. Subsequently, the 2D locations of 3D keypoints are computed using the direction vectors of object pixels, and the 6D object pose is obtained using a PnP algorithm. However, accurate segmentation of object pixels is difficult, particularly under severe occlusion. To this end, a focal segmentation mechanism is proposed that ensures accurate complete segmentation of occluded objects. Extensive experiments on LINEMOD, LINEMOD-Occlusion datasets validate the effectiveness and superiority of our method. Our method improves the accuracy of PVNet by 1.09 and 5.14 on average in terms of the 2D reprojection error and ADD metric, respectively, without increasing the computational time.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are publicly available in the online repository: https://bop.felk.cvut.cz/datasets/.
References
Drummond T, Cipolla R (2002) Real-time visual tracking of complex structures. IEEE Trans Pattern Anal Mach Intell 24(7):932–946. https://doi.org/10.1109/TPAMI.2002.1017620
Gu R, Wang G, Hwang J-n (2019) Efficient multi-person hierarchical 3D pose estimation for autonomous driving. In: 2019 IEEE conference on multimedia information processing and retrieval (MIPR), pp 163–168. https://doi.org/10.1109/MIPR.2019.00036
Kothari N, Gupta M, Vachhani L, Arya H (2017) Pose estimation for an autonomous vehicle using monocular vision. In: 2017 Indian control conference (ICC), pp 424–431. https://doi.org/10.1109/INDIANCC.2017.7846512
Zhang S, Song C, Radkowski R (2019) Setforge - synthetic RGB-D training data generation to support CNN-based pose estimation for augmented reality. In: 2019 IEEE international symposium on mixed and augmented reality adjunct (ISMAR-Adjunct), pp 237–242. https://doi.org/10.1109/ISMAR-Adjunct.2019.00-39
Lu Y, Kourian S, Salvaggio C, Xu C, Lu G (2019) Single image 3D vehicle pose estimation for augmented reality. In: 2019 IEEE global conference on signal and information processing (GlobalSIP), pp 1–5. https://doi.org/10.1109/GlobalSIP45357.2019.8969201
Hachiuma R, Saito H (2016) Recognition and pose estimation of primitive shapes from depth images for spatial augmented reality. In: 2016 IEEE 2nd workshop on everyday virtual reality (WEVR), pp 32–35. https://doi.org/10.1109/WEVR.2016.7859541
Li X, Ling H (2020) Hybrid camera pose estimation with online partitioning for SLAM. IEEE Robot Autom Lett 5:1453–1460. https://doi.org/10.1109/LRA.2020.2967688
Ruan X, Wang F, Huang J (2019) Relative pose estimation of visual SLAM based on convolutional neural networks. In: 2019 Chinese control conference (CCC), pp 8827–8832. https://doi.org/10.23919/ChiCC.2019.8870974
Xiao Z, Wang X, Wang J, Wu Z (2017) Monocular ORB SLAM based on initialization by marker pose estimation. In: 2017 IEEE international conference on information and automation (ICIA), pp 678–682. https://doi.org/10.1109/ICInfA.2017.8078992
Malyavej V, Torteeka P, Wongkharn S, Wiangtong T (2009) Pose estimation of unmanned ground vehicle based on dead-reckoning/GPS sensor fusion by unscented Kalman filter. In: 2009 6th International conference on electrical engineering/electronics, computer, telecommunications and information technology, vol 01. pp 395–398. https://doi.org/10.1109/ECTICON.2009.5137033
Wang C, Xu D, Zhu Y, Martin-Martin R, Lu C, Fei-Fei L, Savarese S (2019) DenseFusion: 6D object pose estimation by iterative dense fusion. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3338–3347. https://doi.org/10.1109/CVPR.2019.00346
Park K, Patten T, Vincze M (2019) Pix2Pose: Pixel-wise coordinate regression of objects for 6D pose estimation. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 7667–7676. https://doi.org/10.1109/ICCV.2019.00776
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) SSD-6D: Making rgb-based 3D detection and 6D pose estimation great again. In: 2017 IEEE international conference on computer vision (ICCV), pp 1530–1538. https://doi.org/10.1109/ICCV.2017.169
Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement. arXiv:1804.02767
Rad M, Lepetit V (2017) BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: 2017 IEEE international conference on computer vision (ICCV), pp 3848–3856. https://doi.org/10.1109/ICCV.2017.413
Kendall A, Grimes M, Cipolla R (2015) PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In: 2015 IEEE international conference on computer vision (ICCV), pp 2938–2946. https://doi.org/10.1109/ICCV.2015.336
Xiang Y, Schmidt T, Narayanan V, Fox D (2018) PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv:1711.00199
Peng S, Zhou X, Liu Y, Lin H, Huang Q, Bao H (2022) PVNet: Pixel-wise voting network for 6DoF object pose estimation. IEEE Trans Pattern Anal Mach Intell 44(06):3212–3223. https://doi.org/10.1109/TPAMI.2020.3047388
Lin T, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(02):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Doumanoglou A, Kouskouridas R, Malassiotis S, Kim T-K (2016) Recovering 6D object pose and predicting next-best-view in the crowd. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3583–3592. https://doi.org/10.1109/CVPR.2016.390
Hinterstoisser S, Lepetit V, Rajkumar N, Konolige K (2016) Going further with point pair features. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016, pp 834–848
Tekin B, Sinha S, Fua P (2018) Real-time seamless single shot 6D object pose prediction. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 292–301. https://doi.org/10.1109/CVPR.2018.00038
Zhao Z, Peng G, Wang H, Fang H-S, Li C, Lu C (2018) Estimating 6D pose from localizing designated surface keypoints. https://doi.org/10.48550/ARXIV.1812.01387
Zakharov S, Shugurov I, Ilic S (2019) DPOD: 6D pose object detector and refiner. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 1941–1950. https://doi.org/10.1109/ICCV.2019.00203
Song C, Song J, Huang Q (2020) HybridPose: 6D object pose estimation under hybrid representations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 428–437. https://doi.org/10.1109/CVPR42600.2020.00051
Haugaard R, Buch A (2022) SurfEmb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6739–6748. https://doi.org/10.1109/CVPR52688.2022.00663
Do T-T, Cai M, Pham T, Reid I (2018) Deep-6DPose: Recovering 6D object pose from a single RGB image. https://doi.org/10.48550/arXiv.1802.10367
Wang G, Manhardt F, Tombari F, Ji X (2021) GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16606–16616. https://doi.org/10.1109/CVPR46437.2021.01634
Pavlakos G, Zhou X, Chan A, Derpanis KG, Daniilidis K (2017) 6-DoF object pose from semantic keypoints. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 2011–2018. https://doi.org/10.1109/ICRA.2017.7989233
Oberweger M, Rad M, Lepetit V (2018) Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018, pp 125–141
Lepetit V, Moreno-Noguer F, Fua P (2009) EPnP: An accurate O(n) solution to the PnP problem. Int J Comput Vis 81(2):155–166. https://doi.org/10.1007/s11263-008-0152-6
Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
https://github.com/zju3dv/pvnet. Accessed 12 Dec 2022
Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P, Lepetit V (2012) Gradient response maps for real-time detection of textureless objects. IEEE Trans Pattern Anal Mach Intell 34(5):876–888. https://doi.org/10.1109/TPAMI.2011.206
Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6D object pose estimation using 3D object coordinates. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV 2014, pp 536–551
Brachmann E, Michel F, Krull A, Yang MY, Gumhold S, Rother C (2016) Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3364–3372. https://doi.org/10.1109/CVPR.2016.366
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2013) Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee KM, Matsushita Y, Rehg JM, Hu Z (eds) Computer vision – ACCV 2012, pp 548–562
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) DetNet: Design backbone for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018, pp 339–354
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) Grant by the Korean Government through the MSIT under Grant 2021R1F1A1045749.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
We have no conflict of interest to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ye, Y., Park, H. Focal segmentation for robust 6D object pose estimation. Multimed Tools Appl 83, 47563–47585 (2024). https://doi.org/10.1007/s11042-023-16937-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16937-y