Skip to main content

LaTeRF: Label and Text Driven Object Radiance Fields

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13663))

Included in the following conference series:

Abstract

Obtaining 3D object representations is important for creating photo-realistic simulations and for collecting AR and VR assets. Neural fields have shown their effectiveness in learning a continuous volumetric representation of a scene from 2D images, but acquiring object representations from these models with weak supervision remains an open challenge. In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene, known camera poses, a natural language description of the object, and a set of point-labels of object and non-object points in the input images. To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional ‘objectness’ probability at each 3D point. Additionally, we leverage the rich latent space of a pre-trained CLIP model combined with our differentiable object renderer, to inpaint the occluded parts of the object. We demonstrate high-fidelity object extraction on both synthetic and real-world datasets and justify our design choices through an extensive ablation study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Armeni, I., et al.: 3d scene graph: a structure for unified semantics, 3d space, and camera. In: ICCV (2019)

    Google Scholar 

  2. Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. arXiv (2021)

    Google Scholar 

  3. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2013)

    Google Scholar 

  4. Boss, M., Braun, R., Jampani, V., Barron, J.T., Liu, C., Lensch, H.: Nerd: neural reflectance decomposition from image collections. In: ICCV (2021)

    Google Scholar 

  5. Candès, E.J.: Harmonic analysis of neural networks. Appl. Comput. Harmonic Anal. 6(2), 197–218 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)

    Google Scholar 

  7. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML (2017)

    Google Scholar 

  8. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

    Google Scholar 

  9. Henzler, P., Mitra, N.J., Ritschel, T.: Escaping plato’s cave: 3d shape from adversarial rendering. In: ICCV (2019)

    Google Scholar 

  10. Henzler, P., et al.: Unsupervised learning of 3d object categories from videos in the wild. In: CVPR (2021)

    Google Scholar 

  11. Hermans, A., Floros, G., Leibe, B.: Dense 3d semantic mapping of indoor scenes from RGB-D images. In: ICRA (2014)

    Google Scholar 

  12. Hénaff, O.J., et al.: Data-efficient image recognition with contrastive predictive coding. In: ICML (2020)

    Google Scholar 

  13. Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. arXiv (2021)

    Google Scholar 

  14. Jain, A., Tancik, M., Abbeel, P.: Putting nerf on a diet: semantically consistent few-shot view synthesis. In: ICCV (2021)

    Google Scholar 

  15. Jiang, G., Kainz, B.: Deep radiance caching: Convolutional autoencoders deeper in ray tracing. Comput. Graph. 94, 22–31 (2021)

    Article  Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  17. Kuang, Z., Olszewski, K., Chai, M., Huang, Z., Achlioptas, P., Tulyakov, S.: NeROIC: Neural object capture and rendering from online image collections. arXiv (2022)

    Google Scholar 

  18. Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: Barf: bundle-adjusting neural radiance fields. In: ICCV (2021)

    Google Scholar 

  19. Liu, L., Gu, J., Lin, K.Z., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. In: NeurIPS (2020)

    Google Scholar 

  20. Ma, L., Stückler, J., Kerl, C., Cremers, D.: Multi-view deep learning for consistent semantic mapping with RGB-D cameras. In: IROS (2017)

    Google Scholar 

  21. Mascaro, R., Teixeira, L., Chli, M.: Diffuser: multi-view 2d-to-3d label diffusion for semantic scene segmentation. In: ICRA (2021)

    Google Scholar 

  22. McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: dense 3d semantic mapping with convolutional neural networks. In: ICRA (2017)

    Google Scholar 

  23. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)

    Google Scholar 

  24. Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. arXiv (2022)

    Google Scholar 

  25. Niemeyer, M., Geiger, A.: Giraffe: representing scenes as compositional generative neural feature fields. In: CVPR (2021)

    Google Scholar 

  26. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv (2019)

    Google Scholar 

  27. Ost, J., Mannan, F., Thuerey, N., Knodt, J., Heide, F.: Neural scene graphs for dynamic scenes. In: CVPR (2021)

    Google Scholar 

  28. Park, K., et al.: Nerfies: deformable neural radiance fields. In: ICCV (2021)

    Google Scholar 

  29. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) NeurIPS (2019)

    Google Scholar 

  30. Pett, D.: BritishMuseumDH/moldGoldCape: first release of the cape in 3D (2017). https://doi.org/10.5281/zenodo.344914

  31. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) ICML (2021)

    Google Scholar 

  32. Ramesh, A., et al.: Zero-shot text-to-image generation. arXiv (2021)

    Google Scholar 

  33. Rebain, D., Jiang, W., Yazdani, S., Li, K., Yi, K.M., Tagliasacchi, A.: Derf: decomposed radiance fields. In: CVPR (2020)

    Google Scholar 

  34. Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)

    Google Scholar 

  35. Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: CVPR (2022)

    Google Scholar 

  36. Sitzmann, V., Martel, J.N., Bergman, A.W., Lindell, D.B., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: NeurIPS (2020)

    Google Scholar 

  37. Sonoda, S., Murata, N.: Neural network with unbounded activation functions is universal approximator. Appl. Comput. Harmonic Anal. 3(2), 233–268 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  38. Stelzner, K., Kersting, K., Kosiorek, A.R.: Decomposing 3d scenes into objects via unsupervised volume segmentation. arXiv (2021)

    Google Scholar 

  39. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: ICCV (2015)

    Google Scholar 

  40. Takikawa, T., et al.: Neural geometric level of detail: real-time rendering with implicit 3D shapes. In: CVPR (2021)

    Google Scholar 

  41. Tewari, A., et al.: Advances in neural rendering. In: SIGGRAPH (2021)

    Google Scholar 

  42. Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR (2017)

    Google Scholar 

  43. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  44. Vineet, V., et al.: Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: ICRA (2015)

    Google Scholar 

  45. Vora, S., et al.: Nesf: neural semantic fields for generalizable semantic segmentation of 3d scenes. arXiv (2021)

    Google Scholar 

  46. Wang, C., Chai, M., He, M., Chen, D., Liao, J.: Clip-nerf: text-and-image driven manipulation of neural radiance fields. arXiv (2021)

    Google Scholar 

  47. Wu, S., Jakab, T., Rupprecht, C., Vedaldi, A.: Dove: learning deformable 3d objects by watching videos. arXiv (2021)

    Google Scholar 

  48. Yen-Chen, L.: Nerf-pytorch. https://github.com/yenchenlin/nerf-pytorch/ (2020)

  49. Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: iNeRF: inverting neural radiance fields for pose estimation. In: IROS (2021)

    Google Scholar 

  50. Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelNeRF: neural radiance fields from one or few images. In: CVPR (2021)

    Google Scholar 

  51. Yu, H.X., Guibas, L.J., Wu, J.: Unsupervised discovery of object radiance fields. In: ICLR (2022)

    Google Scholar 

  52. Zhang, C., Liu, Z., Liu, G., Huang, D.: Large-scale 3d semantic mapping using monocular vision. In: ICIVC (2019)

    Google Scholar 

  53. Zhang, K., Riegler, G., Snavely, N., Koltun, V.: Nerf++: analyzing and improving neural radiance fields. arXiv (2020)

    Google Scholar 

  54. Zhi, S., Laidlow, T., Leutenegger, S., Davison, A.: In-place scene labelling and understanding with implicit scene representation. In: ICCV (2021)

    Google Scholar 

  55. Zhi, S., Sucar, E., Mouton, A., Haughton, I., Laidlow, T., Davison, A.J.: iLabel: interactive neural scene labelling (2021)

    Google Scholar 

  56. Zhu, J.Y., Wu, J., Xu, Y., Chang, E., Tu, Z.: Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(4), 862–875 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashkan Mirzaei .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2951 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mirzaei, A., Kant, Y., Kelly, J., Gilitschenski, I. (2022). LaTeRF: Label and Text Driven Object Radiance Fields. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13663. Springer, Cham. https://doi.org/10.1007/978-3-031-20062-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20062-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20061-8

  • Online ISBN: 978-3-031-20062-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics