Skip to main content
Log in

Focal segmentation for robust 6D object pose estimation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the field of augmented reality, 6D pose estimation of rigid objects poses limitations and challenges. Most of the previous 6D pose estimation methods have trained deep neural networks to directly regress poses from input images or predict the 2D locations of 3D keypoints for pose estimation; thus, they are vulnerable to large occlusion. This study addresses the challenge of 6D pose estimation from a single RGB image under severe occlusion. A novel method is proposed that is based on PVNet but improves its performance. Similar to PVNet, our method regresses target object segments and pixel-wise direction vectors from an RGB image. Subsequently, the 2D locations of 3D keypoints are computed using the direction vectors of object pixels, and the 6D object pose is obtained using a PnP algorithm. However, accurate segmentation of object pixels is difficult, particularly under severe occlusion. To this end, a focal segmentation mechanism is proposed that ensures accurate complete segmentation of occluded objects. Extensive experiments on LINEMOD, LINEMOD-Occlusion datasets validate the effectiveness and superiority of our method. Our method improves the accuracy of PVNet by 1.09 and 5.14 on average in terms of the 2D reprojection error and ADD metric, respectively, without increasing the computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The data that support the findings of this study are publicly available in the online repository: https://bop.felk.cvut.cz/datasets/.

References

  1. Drummond T, Cipolla R (2002) Real-time visual tracking of complex structures. IEEE Trans Pattern Anal Mach Intell 24(7):932–946. https://doi.org/10.1109/TPAMI.2002.1017620

    Article  Google Scholar 

  2. Gu R, Wang G, Hwang J-n (2019) Efficient multi-person hierarchical 3D pose estimation for autonomous driving. In: 2019 IEEE conference on multimedia information processing and retrieval (MIPR), pp 163–168. https://doi.org/10.1109/MIPR.2019.00036

  3. Kothari N, Gupta M, Vachhani L, Arya H (2017) Pose estimation for an autonomous vehicle using monocular vision. In: 2017 Indian control conference (ICC), pp 424–431. https://doi.org/10.1109/INDIANCC.2017.7846512

  4. Zhang S, Song C, Radkowski R (2019) Setforge - synthetic RGB-D training data generation to support CNN-based pose estimation for augmented reality. In: 2019 IEEE international symposium on mixed and augmented reality adjunct (ISMAR-Adjunct), pp 237–242. https://doi.org/10.1109/ISMAR-Adjunct.2019.00-39

  5. Lu Y, Kourian S, Salvaggio C, Xu C, Lu G (2019) Single image 3D vehicle pose estimation for augmented reality. In: 2019 IEEE global conference on signal and information processing (GlobalSIP), pp 1–5. https://doi.org/10.1109/GlobalSIP45357.2019.8969201

  6. Hachiuma R, Saito H (2016) Recognition and pose estimation of primitive shapes from depth images for spatial augmented reality. In: 2016 IEEE 2nd workshop on everyday virtual reality (WEVR), pp 32–35. https://doi.org/10.1109/WEVR.2016.7859541

  7. Li X, Ling H (2020) Hybrid camera pose estimation with online partitioning for SLAM. IEEE Robot Autom Lett 5:1453–1460. https://doi.org/10.1109/LRA.2020.2967688

    Article  Google Scholar 

  8. Ruan X, Wang F, Huang J (2019) Relative pose estimation of visual SLAM based on convolutional neural networks. In: 2019 Chinese control conference (CCC), pp 8827–8832. https://doi.org/10.23919/ChiCC.2019.8870974

  9. Xiao Z, Wang X, Wang J, Wu Z (2017) Monocular ORB SLAM based on initialization by marker pose estimation. In: 2017 IEEE international conference on information and automation (ICIA), pp 678–682. https://doi.org/10.1109/ICInfA.2017.8078992

  10. Malyavej V, Torteeka P, Wongkharn S, Wiangtong T (2009) Pose estimation of unmanned ground vehicle based on dead-reckoning/GPS sensor fusion by unscented Kalman filter. In: 2009 6th International conference on electrical engineering/electronics, computer, telecommunications and information technology, vol 01. pp 395–398. https://doi.org/10.1109/ECTICON.2009.5137033

  11. Wang C, Xu D, Zhu Y, Martin-Martin R, Lu C, Fei-Fei L, Savarese S (2019) DenseFusion: 6D object pose estimation by iterative dense fusion. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3338–3347. https://doi.org/10.1109/CVPR.2019.00346

  12. Park K, Patten T, Vincze M (2019) Pix2Pose: Pixel-wise coordinate regression of objects for 6D pose estimation. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 7667–7676. https://doi.org/10.1109/ICCV.2019.00776

  13. Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) SSD-6D: Making rgb-based 3D detection and 6D pose estimation great again. In: 2017 IEEE international conference on computer vision (ICCV), pp 1530–1538. https://doi.org/10.1109/ICCV.2017.169

  14. Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement. arXiv:1804.02767

  15. Rad M, Lepetit V (2017) BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: 2017 IEEE international conference on computer vision (ICCV), pp 3848–3856. https://doi.org/10.1109/ICCV.2017.413

  16. Kendall A, Grimes M, Cipolla R (2015) PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In: 2015 IEEE international conference on computer vision (ICCV), pp 2938–2946. https://doi.org/10.1109/ICCV.2015.336

  17. Xiang Y, Schmidt T, Narayanan V, Fox D (2018) PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv:1711.00199

  18. Peng S, Zhou X, Liu Y, Lin H, Huang Q, Bao H (2022) PVNet: Pixel-wise voting network for 6DoF object pose estimation. IEEE Trans Pattern Anal Mach Intell 44(06):3212–3223. https://doi.org/10.1109/TPAMI.2020.3047388

    Article  Google Scholar 

  19. Lin T, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(02):318–327. https://doi.org/10.1109/TPAMI.2018.2858826

    Article  Google Scholar 

  20. Doumanoglou A, Kouskouridas R, Malassiotis S, Kim T-K (2016) Recovering 6D object pose and predicting next-best-view in the crowd. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3583–3592. https://doi.org/10.1109/CVPR.2016.390

  21. Hinterstoisser S, Lepetit V, Rajkumar N, Konolige K (2016) Going further with point pair features. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016, pp 834–848

  22. Tekin B, Sinha S, Fua P (2018) Real-time seamless single shot 6D object pose prediction. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 292–301. https://doi.org/10.1109/CVPR.2018.00038

  23. Zhao Z, Peng G, Wang H, Fang H-S, Li C, Lu C (2018) Estimating 6D pose from localizing designated surface keypoints. https://doi.org/10.48550/ARXIV.1812.01387

  24. Zakharov S, Shugurov I, Ilic S (2019) DPOD: 6D pose object detector and refiner. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 1941–1950. https://doi.org/10.1109/ICCV.2019.00203

  25. Song C, Song J, Huang Q (2020) HybridPose: 6D object pose estimation under hybrid representations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 428–437. https://doi.org/10.1109/CVPR42600.2020.00051

  26. Haugaard R, Buch A (2022) SurfEmb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6739–6748. https://doi.org/10.1109/CVPR52688.2022.00663

  27. Do T-T, Cai M, Pham T, Reid I (2018) Deep-6DPose: Recovering 6D object pose from a single RGB image. https://doi.org/10.48550/arXiv.1802.10367

  28. Wang G, Manhardt F, Tombari F, Ji X (2021) GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16606–16616. https://doi.org/10.1109/CVPR46437.2021.01634

  29. Pavlakos G, Zhou X, Chan A, Derpanis KG, Daniilidis K (2017) 6-DoF object pose from semantic keypoints. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 2011–2018. https://doi.org/10.1109/ICRA.2017.7989233

  30. Oberweger M, Rad M, Lepetit V (2018) Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018, pp 125–141

  31. Lepetit V, Moreno-Noguer F, Fua P (2009) EPnP: An accurate O(n) solution to the PnP problem. Int J Comput Vis 81(2):155–166. https://doi.org/10.1007/s11263-008-0152-6

    Article  Google Scholar 

  32. Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169

  33. https://github.com/zju3dv/pvnet. Accessed 12 Dec 2022

  34. Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P, Lepetit V (2012) Gradient response maps for real-time detection of textureless objects. IEEE Trans Pattern Anal Mach Intell 34(5):876–888. https://doi.org/10.1109/TPAMI.2011.206

    Article  Google Scholar 

  35. Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6D object pose estimation using 3D object coordinates. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV 2014, pp 536–551

  36. Brachmann E, Michel F, Krull A, Yang MY, Gumhold S, Rother C (2016) Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3364–3372. https://doi.org/10.1109/CVPR.2016.366

  37. Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2013) Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee KM, Matsushita Y, Rehg JM, Hu Z (eds) Computer vision – ACCV 2012, pp 548–562

  38. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  39. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) DetNet: Design backbone for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018, pp 339–354

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) Grant by the Korean Government through the MSIT under Grant 2021R1F1A1045749.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanhoon Park.

Ethics declarations

Conflicts of interest

We have no conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, Y., Park, H. Focal segmentation for robust 6D object pose estimation. Multimed Tools Appl 83, 47563–47585 (2024). https://doi.org/10.1007/s11042-023-16937-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16937-y

Keywords

Navigation