Advertisement

Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild

Conference paper
  • 845 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12361)

Abstract

We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. In contrast to previous methods, we make two main contributions: First, instead of comparing real-world images and synthetic renderings in the RGB or mask space, we compare them in a feature space optimized for 3D pose refinement. Second, we introduce a novel differentiable renderer that learns to approximate the rasterization backward pass from data instead of relying on a hand-crafted algorithm. For this purpose, we predict deep cross-domain correspondences between RGB images and 3D model renderings in the form of what we call geometric correspondence fields. These correspondence fields serve as pixel-level gradients which are analytically propagated backward through the rendering pipeline to perform a gradient-based optimization directly on the 3D pose. In this way, we precisely align 3D models to objects in RGB images which results in significantly improved 3D pose estimates. We evaluate our approach on the challenging Pix3D dataset and achieve up to 55% relative improvement compared to state-of-the-art refinement methods in multiple metrics.

Supplementary material

504471_1_En_7_MOESM1_ESM.pdf (54.9 mb)
Supplementary material 1 (pdf 56172 KB)

References

  1. 1.
    Azinovic, D., Li, T.M., Kaplanyan, A., Niessner, M.: Inverse path tracing for joint material and lighting estimation. In: Conference on Computer Vision and Pattern Recognition, pp. 2447–2456 (2019)Google Scholar
  2. 2.
    Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-319-10605-2_35CrossRefGoogle Scholar
  3. 3.
    Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: Conference on Computer Vision and Pattern Recognition, pp. 3364–3372 (2016)Google Scholar
  4. 4.
    Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: Advances in Neural Information Processing Systems, pp. 2414–2422 (2016)Google Scholar
  5. 5.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Conference on Computer Vision and Pattern Recognition, pp. 2758–2766 (2015)Google Scholar
  6. 6.
    Genova, K., Cole, F., Maschinot, A., Sarna, A., Vlasic, D., Freeman, W.T.: Unsupervised training for 3D morphable model regression. In: Conference on Computer Vision and Pattern Recognition, pp. 8377–8386 (2018)Google Scholar
  7. 7.
    Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: Conference on Computer Vision and Pattern Recognition, pp. 3022–3031 (2018)Google Scholar
  8. 8.
    Grabner, A., Roth, P.M., Lepetit, V.: GP2C: geometric projection parameter consensus for joint 3D pose and focal length estimation in the wild. In: International Conference on Computer Vision, pp. 2222–2231 (2019)Google Scholar
  9. 9.
    Grabner, A., Roth, P.M., Lepetit, V.: Location field descriptors: single image 3D model retrieval in the wild. In: International Conference on 3D Vision, pp. 583–593 (2019)Google Scholar
  10. 10.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  11. 11.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-319-46493-0_38CrossRefGoogle Scholar
  14. 14.
    Henderson, P., Ferrari, V.: Learning to generate and reconstruct 3D meshes with only 2D supervision. In: British Machine Vision Conference, pp. 139:1–139:13 (2018)Google Scholar
  15. 15.
    Jafari, O.H., Mustikovela, S.K., Pertsch, K., Brachmann, E., Rother, C.: iPose: instance-aware 6D pose estimation of partly occluded objects. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 477–492. SPringer, Heidelberg (2018).  https://doi.org/10.1007/978-3-030-20893-6_30CrossRefGoogle Scholar
  16. 16.
    Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: European Conference on Computer Vision, pp. 371–386 (2018)Google Scholar
  17. 17.
    Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2018)Google Scholar
  18. 18.
    Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. in: International Conference on Computer Vision, pp. 1530–1538 (2017)Google Scholar
  19. 19.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
  20. 20.
    Kundu, A., Li, Y., Rehg, J.M.: 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: Conference on Computer Vision and Pattern Recognition, pp. 3559–3568 (2018)Google Scholar
  21. 21.
    Li, C., Bai, J., Hager, G.D.: A unified framework for multi-view multi-class object pose estimation. In: European Conference on Computer Vision, pp. 1–16 (2018)Google Scholar
  22. 22.
    Li, T.M., Aittala, M., Durand, F., Lehtinen, J.: Differentiable Monte Carlo ray tracing through edge sampling. In: ACM SIGGRAPH Asia, pp. 222:1–222:11 (2018)Google Scholar
  23. 23.
    Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: European Conference on Computer Vision, pp. 683–698 (2018)Google Scholar
  24. 24.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  25. 25.
    Liu, L., Lu, J., Xu, C., Tian, Q., Zhou, J.: Deep fitting degree scoring network for monocular 3D object detection. In: Conference on Computer Vision and Pattern Recognition, pp. 1057–1066 (2019)Google Scholar
  26. 26.
    Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In: International Conference on Computer Vision, pp. 7708–7717 (2019)Google Scholar
  27. 27.
    Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-319-10584-0_11CrossRefGoogle Scholar
  28. 28.
    Mahendran, S., Ali, H., Vidal, R.: A mixed classification-regression framework for 3D pose estimation from 2D images. In: British Machine Vision Conference, pp. 238:1–238:12 (2018)Google Scholar
  29. 29.
    Manhardt, F., Kehl, W., Navab, N., Tombari, F.: Deep model-based 6D pose refinement in RGB. In: European Conference on Computer Vision, pp. 800–815 (2018)Google Scholar
  30. 30.
    Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. In: British Machine Vision Conference, pp. 91:1–91:12 (2016)Google Scholar
  31. 31.
    Mottaghi, R., Xiang, Y., Savarese, S.: A coarse-to-fine model for 3D pose estimation and sub-category recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 418–426 (2015)Google Scholar
  32. 32.
    Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: Conference on Computer Vision and Pattern Recognition, pp. 7074–7082 (2017)Google Scholar
  33. 33.
    Nguyen-Phuoc, T.H., Li, C., Balaban, S., Yang, Y.: RenderNet: a deep convolutional network for differentiable rendering from 3D shapes. In: Advances in Neural Information Processing Systems, pp. 7891–7901 (2018)Google Scholar
  34. 34.
    Palazzi, A., Bergamini, L., Calderara, S., Cucchiara, R.: End-to-end 6-DoF object pose estimation through differentiable rasterization. In: European Conference on Computer Vision Workshops, pp. 1–14 (2018)Google Scholar
  35. 35.
    Pavlakos, G., Zhou, X., Chan, A., Derpanis, K., Daniilidis, K.: 6-DoF object pose from semantic keypoints. In: International Conference on Robotics and Automation, pp. 2011–2018 (2017)Google Scholar
  36. 36.
    Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: 3D object class detection in the wild. In: Conference on Computer Vision and Pattern Recognition, pp. 4561–4570 (2019)Google Scholar
  37. 37.
    Pepik, B., Stark, M., Gehler, P., Ritschel, T., Schiele, B.: 3D object class detection in the wild. In: Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–10 (2015)Google Scholar
  38. 38.
    Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: International Conference on Computer Vision, pp. 3828–3836 (2017)Google Scholar
  39. 39.
    Sun, X., et al.: Pix3D: dataset and methods for single-image 3D shape modeling. In: Conference on Computer Vision and Pattern Recognition, pp. 2974–2983 (2018)Google Scholar
  40. 40.
    Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: Conference on Computer Vision and Pattern Recognition, pp. 292–301 (2018)Google Scholar
  41. 41.
    Tulsiani, S., Carreira, J., Malik, J.: Pose induction for novel object categories. In: International Conference on Computer Vision, pp. 64–72 (2015)Google Scholar
  42. 42.
    Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: Conference on Computer Vision and Pattern Recognition, pp. 1510–1519 (2015)Google Scholar
  43. 43.
    Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)Google Scholar
  44. 44.
    Wang, Y., et al.: 3D pose estimation for fine-grained object categories. In: European Conference on Computer Vision Workshops (2018)Google Scholar
  45. 45.
    Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)Google Scholar
  46. 46.
    Xiang, Y., et al.: ObjectNet3D: a large scale database for 3D object recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 160–176. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-319-46484-8_10CrossRefGoogle Scholar
  47. 47.
    Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: Robotics: Science and Systems Conference, pp. 1–10 (2018)Google Scholar
  48. 48.
    Xiao, Y., Qiu, X., Langlois, P.A., Aubry, M., Marlet, R.: Pose from shape: deep pose estimation for arbitrary 3D objects. In: British Machine Vision Conference, pp. 120:1–120:14 (2019)Google Scholar
  49. 49.
    Yao, S., Hsu, T.M., Zhu, J.Y., Wu, J., Torralba, A., Freeman, W.T., Tenenbaum, J.: 3D-Aware Scene Manipulation via Inverse Graphics. In: Advances in Neural Information Processing Systems. pp. 1887–1898 (2018)Google Scholar
  50. 50.
    Zabulis, X., Lourakis, M.I.A., Stefanou, S.S.: 3D pose refinement using rendering and texture-based matching. In: International Conference on Computer Vision and Graphics, pp. 672–679 (2014)Google Scholar
  51. 51.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)Google Scholar
  52. 52.
    Zakharov, S., Shugurov, I., Ilic, S.: DPOD: dense 6D pose object detector in RGB images. In: International Conference on Computer Vision, pp. 1941–1950 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Graz University of TechnologyGrazAustria
  2. 2.Facebook Inc.Menlo ParkUSA

Personalised recommendations