Skip to main content

Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12349)

Abstract

Recent methods for 6D pose estimation of objects assume either textured 3D models or real images that cover the entire range of target poses. However, it is difficult to obtain textured 3D models and annotate the poses of objects in real scenarios. This paper proposes a method, Neural Object Learning (NOL), that creates synthetic images of objects in arbitrary poses by combining only a few observations from cluttered images. A novel refinement step is proposed to align inaccurate poses of objects in source images, which results in better quality images. Evaluations performed on two public datasets show that the rendered images created by NOL lead to state-of-the-art performance in comparison to methods that use 13 times the number of real images. Evaluations on our new dataset show multiple objects can be trained and recognized simultaneously using a sequence of a fixed scene.

Keywords

  • 6D pose estimation
  • Object learning
  • Object model
  • Object modeling
  • Differentiable rendering
  • Object recognition

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-58548-8_38
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-58548-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   149.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Notes

  1. 1.

    https://www.acin.tuwien.ac.at/en/vision-for-robotics/software-tools/smot/.

  2. 2.

    https://github.com/kirumang/NOL.

References

  1. Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35

    CrossRef  Google Scholar 

  2. Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., Rother, C.: Uncertainty-driven 6d pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3364–3372 (2016)

    Google Scholar 

  3. Calli, B., Walsman, A., Singh, A., Srinivasa, S.S., Abbeel, P., Dollar, A.M.: Benchmarking in manipulation research: using the Yale-CMU-Berkeley object and model set. IEEE Robot. Autom. Mag. (RAM) 22, 36–52 (2015)

    CrossRef  Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)

    Google Scholar 

  5. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision (IJCV) 88, 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4

    CrossRef  Google Scholar 

  6. Fu, Y., Yan, Q., Yang, L., Liao, J., Xiao, C.: Texture mapping for 3D reconstruction with RGB-D sensor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4645–4653 (2018)

    Google Scholar 

  7. Fäulhammer, T., et al.: Autonomous learning of object models on a mobile robot. IEEE Robot. Autom. Lett. (RA-L) 2(1), 26–33 (2017). https://doi.org/10.1109/LRA.2016.2522086

  8. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27 (NeurIPS), pp. 2672–2680. Curran Associates, Inc. (2014)

    Google Scholar 

  9. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  11. He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11632–11641 (2020)

    Google Scholar 

  12. Henderson, P., Ferrari, V.: Learning single-image 3D reconstruction by generative modelling of shape, pose and shading. Int. J. Comput. Vision 128(4), 835–854 (2019). https://doi.org/10.1007/s11263-019-01219-8

    CrossRef  Google Scholar 

  13. Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42

    CrossRef  Google Scholar 

  14. Hodaň, T., et al.: BOP: benchmark for 6D object pose estimation (2019). https://bop.felk.cvut.cz. Visited on 21 February 2020

  15. Hodaň, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., Zabulis, X.: T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 880–888 (2017)

    Google Scholar 

  16. Hodaň, T., Matas, J., Obdržálek, Š.: On evaluation of 6D object pose estimation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 606–619. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_52

    CrossRef  Google Scholar 

  17. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708 (2017)

    Google Scholar 

  18. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43

    CrossRef  Google Scholar 

  19. Kaskman, R., Zakharov, S., Shugurov, I., Ilic, S.: HomebrewedDB: RGB-D dataset for 6d pose estimation of 3d objects. In: The IEEE International Conference on Computer Vision Workshops (ICCVW) (2019)

    Google Scholar 

  20. Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3907–3916 (2018)

    Google Scholar 

  21. Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1521–1529 (2017)

    Google Scholar 

  22. Krainin, M., Henry, P., Ren, X., Fox, D.: Manipulator and object tracking for in-hand 3D object modeling. Int. J. Robot. Res. (IJRR) 30(11), 1311–1327 (2011)

    CrossRef  Google Scholar 

  23. Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. Int. J. Comput. Vis. 128(3), 657–678 (2019). https://doi.org/10.1007/s11263-019-01250-9

    CrossRef  Google Scholar 

  24. Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 7678–7687 (2019)

    Google Scholar 

  25. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017)

    Google Scholar 

  26. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    CrossRef  Google Scholar 

  27. Park, K., Patten, T., Vincze, M.: Pix2Pose: pixel-wise coordinate regression of objects for 6D pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 7668–7677 (2019)

    Google Scholar 

  28. Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DofF pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4556–4565, June 2019

    Google Scholar 

  29. Philip, J., Drettakis, G.: Plane-based multi-view inpainting for image-based rendering in large scenes. In: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 1–11 (2018)

    Google Scholar 

  30. Prankl, J., Aldoma, A., Svejda, A., Vincze, M.: RGB-D object modelling for object recognition and tracking. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 96–103 (2015)

    Google Scholar 

  31. Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3828–3836 (2017)

    Google Scholar 

  32. Rad, M., Oberweger, M., Lepetit, V.: Domain transfer for 3D pose estimation from color images without manual annotations. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11365, pp. 69–84. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20873-8_5

    CrossRef  Google Scholar 

  33. Shum, H., Kang, S.B.: Review of image-based rendering techniques. In: Visual Communications and Image Processing 2000, vol. 4067, pp. 2–13. International Society for Optics and Photonics (2000)

    Google Scholar 

  34. Sundermeyer, M., Marton, Z.-C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 712–729. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_43

    CrossRef  Google Scholar 

  35. Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 292–301 (2018)

    Google Scholar 

  36. Thies, J., Zollhöfer, M., Nieundefinedner, M.: Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. 38(4) (2019). https://doi.org/10.1145/3306346.3323035

  37. Thonat, T., Shechtman, E., Paris, S., Drettakis, G.: Multi-view inpainting for image-based scene editing and rendering. In: International Conference on 3D Vision (3DV), pp. 351–359. IEEE (2016)

    Google Scholar 

  38. Vidal, J., Lin, C.Y., Lladó, X., Martí, R.: A method for 6D pose estimation of free-form rigid objects using point pair features on range data. Sensors 18(8), 2678 (2018)

    CrossRef  Google Scholar 

  39. Waechter, M., Moehrle, N., Goesele, M.: Let there be color! large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_54

    CrossRef  Google Scholar 

  40. Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3343–3352 (2019)

    Google Scholar 

  41. Wang, F., Hauser, K.: In-hand object scanning via RGB-D video segmentation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 3296–3302 (2019)

    Google Scholar 

  42. Whyte, O., Sivic, J., Zisserman, A.: Get out of my picture! internet-based inpainting. In: British Machine Vision Conference (BMVC) (2009)

    Google Scholar 

  43. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robotics: Science and Systems (RSS) (2018)

    Google Scholar 

  44. Yu, J., Fan, Y., Yang, J., Xu, N., Wang, Z., Wang, X., Huang, T.: Wide activation for efficient and accurate image super-resolution. arXiv preprint arXiv:1808.08718 (2018)

  45. Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1941–1950 (2019)

    Google Scholar 

  46. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)

    Google Scholar 

  47. Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv:1801.09847 (2018)

Download references

Acknowledgment

The research leading to these results has partially funded by the Austrian Science Fund (FWF) under grant agreement No. I3969-N30 (InDex) and the Austrian Research Promotion Agency (FFG) under grant agreement No. 879878 (K4R).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiru Park .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 47335 KB)

Supplementary material 1 (pdf 18976 KB)

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Park, K., Patten, T., Vincze, M. (2020). Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12349. Springer, Cham. https://doi.org/10.1007/978-3-030-58548-8_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58548-8_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58547-1

  • Online ISBN: 978-3-030-58548-8

  • eBook Packages: Computer ScienceComputer Science (R0)