Skip to main content
Log in

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The Neural Radiance Field (NeRF) exhibits excellent performance for view synthesis tasks, but it requires a large amount of memory and model parameters during three-dimensional (3D) scene reconstruction. This paper proposes a block-term tensor decomposition radiance field (BTD-RF), which is a novel approach that achieves significant model compression while preserving reconstruction quality. BTD-RF decomposes high-dimensional radiance fields into low-dimensional tensor blocks, resulting in a value 2.21 times smaller than the baseline method. Decomposing the model into low-dimensional tensor blocks allows substituting the standard multi-head attention of transformers with a lightweight multi-linear attention mechanism, employing element-wise products and sharing parameters. This significantly reduces the model complexity without compromising performance. Extensive evaluations on various datasets demonstrate that BTD-RF achieves superior image reconstruction quality compared to prior methods. Quantitative metrics and qualitative assessments confirm that BTD-RF generates images that are structurally and perceptually close to ground truth, showcasing exceptional performance despite its lightweight design. BTD-RF offers a compelling trade-off between model size and reconstruction quality for three-dimensional (3D) scene reconstruction. Its efficient design makes it suitable for resource-constrained applications while delivering high-fidelity results, paving the way for broader NeRF utilization. The code is available at https://github.com/seonbin-kim/BTDRF

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are available on request from the corresponding author of references [1, 21, 39,40,41].

Code availability

Researchers or interested parties are welcome to contact the corresponding author B.C.K. for further explanation, who may also provide the Python codes upon request.

References

  1. Mildenhall B, Srinivasan P, Tancik M, Barron J, Ramamoorthi R, Ng R (2020) Nerf: Representing scenes as neural radiance fields for view synthesis. In: Proceedings of the European Conference on Computer Vision, pp. 1–17

  2. Kerbl B, Kopanas G, Leimkühler T, Drettakis G (2023) 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42(4):1–14

    Article  Google Scholar 

  3. Wang P, Liu Y, Chen Z, Liu L, Liu Z, Komura T, Theobalt C, Wang W (2023) F2-nerf: Fast neural radiance field training with free camera trajectories. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 4150–4159

  4. Chen Z, Funkhouser T, Hedman P, Tagliasacchi A (2023) Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 16569–16578

  5. Chen A, Xu Z, Geiger A, Yu J, Su H (2022) Tensorf: Tensorial radiance fields. In: Proceedings of the European Conference on Computer Vision, pp. 333–350

  6. Yang J, Pavone M, Wang Y (2023) Freenerf: Improving few-shot neural rendering with free frequency regularization. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 8254–8263

  7. Li X, Cao Z, Sun H, Zhang J, Xian K, Lin G (2023) 3d cinemagraphy from a single image. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 4595–4605

  8. Niemeyer M, Barron JT, Mildenhall B, Sajjadi MS, Geiger A, Radwan N (2022) Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In: IEEE Conference on Computer Vision and Pattern Regognition, pp.5480–5490

  9. Barron JT, Mildenhall B, Verbin D, Srinivasan PP, Hedman P (2023) Zipnerf: Anti-aliased grid-based neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19697–19705

  10. Xiangli Y, Xu L, Pan X, Zhao N, Rao A, Theobalt C, Dai B, Lin D (2022) Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In: Proceedings of the European Conference on Computer Vision, pp.106–122

  11. Barron JT, Mildenhall B, Verbin D, Srinivasan PP, Hedman P (2022) Mipnerf 360: Unbounded anti-aliased neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5470–5479

  12. Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM review 51(3):455–500

    Google Scholar 

  13. De Lathauwer L (2008) Decompositions of a higher-order tensor in block terms-part ii: Definitions and uniqueness. SIAM J Matrix Anal Appl 30(3):1033–1066

    Article  MathSciNet  Google Scholar 

  14. Yu A, Li R, Tancik M, Li H, Ng R, Kanazawa A (2021) Plenoctrees for real-time rendering of neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5752–5761

  15. Fridovich-Keil S, Yu A, Tancik M, Chen Q, Recht B, Kanazawa A (2022) Plenoxels: Radiance fields without neural networks. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5501–5510

  16. Reiser C, Peng S, Liao Y, Geiger A (2021) Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 14335–14345

  17. Sun C, Sun M, Chen HT (2022) Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5459–5469

  18. Müller T, Evans A, Schied C, Keller A (2022) Instant neural graphics primitives with a multiresolution hash encoding. Transactions on graphics 41(4):1–15

    Article  Google Scholar 

  19. Yu A, Ye V, Tancik M, Kanazawa A (2021) pixelnerf: Neural radiance fields from one or few images. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 4578–4587

  20. Deng K, Liu A, Zhu JY, Ramanan D (2022) Depth-supervised nerf: Fewer views and faster training for free. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 12882–12891

  21. Verbin D, Hedman P, Mildenhall B, Zickler T, Barron JT, Srinivasan PP (2022) Ref-nerf: Structured view-dependent appearance for neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5481–5490

  22. Varma M, Wang P, Chen X, Chen T, Venugopalan S, Wang Z (2023) Is attention all that nerf needs? In: International Conference on Learning Representations, pp. 1–22

  23. Xu Q, Xu Z, Philip J, Bi S, Shu Z, Sunkavalli K, Neumann U (2022) Point-nerf: Point-based neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5438–5448

  24. Kulhanek J, Sattler T (2023) Tetra-NeRF: Representing neural radiance fields using tetrahedra. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 18458–18469

  25. Tang J, Chen X, Wang J, Zeng G (2022) Compressible-composable nerf via rank residual decomposition. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 14798–14809

  26. Shi J, Guillemot C (2023) Light field compression via compact neural scene representation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1–5

  27. Gordon C, Chng SF, MacDonald L, Lucey S (2023) On quantizing implicit neural representations. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 341–350

  28. Barron JT, Mildenhall B, Tancik M, Hedman P, Martin-Brualla R, Srinivasan PP (2021) Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5855–5864

  29. Fridovich-Keil S, Meanti G, Warburg FR, Recht B, Kanazawa A (2023) K-planes: Explicit radiance fields in space, time, and appearance. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 12479–12488

  30. Carroll JD, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika 35(3):283–319

  31. Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31(3):279–311

    Article  MathSciNet  Google Scholar 

  32. Ma X, Zhang P, Zhang S, Duan N, Hou Y, Zhou M, Song D (2019) A tensorized transformer for language modeling. Adv Neural Inf Process Syst 32:1–11

    Google Scholar 

  33. Wu L, Lee JY, Bhattad A, Wang YX, Forsyth D (2022) Diver: Real-time and accurate neural radiance fields with deterministic integration for volume rendering. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 16200–16209

  34. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1–11

    Google Scholar 

  35. Kajiya JT (1986) The rendering equation. In: Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, pp. 143–150

  36. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: International Conference on Learning Representations, pp. 1–11

  37. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  38. Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 586–595

  39. Liu L, Gu J, Zaw Lin K, Chua TS, Theobalt C (2020) Neural sparse voxel fields. Adv Neural Inf Process Syst 33:15651–15663

    Google Scholar 

  40. Knapitsch A, Park J, Zhou QY, Koltun V (2017) Tanks and temples: Bench marking large-scale scene reconstruction. Transactions on graphics 36(4):1–13

    Article  Google Scholar 

  41. Mildenhall B, Srinivasan PP, Ortiz-Cayon R, Kalantari NK, Ramamoorthi R, Ng R, Kar A (2019) Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. Transactions on graphics 38(4):1–14

    Article  Google Scholar 

  42. Zhang Y, Huang X, Ni B, Li T, Zhang W (2023) Frequency-modulated point cloud rendering with easy editing. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 119–129

  43. Li S, Li H, Wang Y, Liao Y, Yu L (2023) Steernerf: Accelerating nerf rendering via smooth viewpoint trajectory. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 20701–20711

  44. Hu W, Wang Y, Ma L, Yang B, Gao L, Liu X, Ma Y (2023) Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19774–19783

  45. Wu R, Mildenhall B, Henzler P, Park K, Gao R, Watson D, Srinivasan PP, Verbin D, Barron JT, Poole B, et al (2023) Reconfusion: 3d reconstruction with diffusion priors. arXiv:2312.02981

Download references

Funding

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2022R1I1A3058128).

Author information

Authors and Affiliations

Authors

Contributions

S.B.K., S.K. and D.A were responsible for the design and overall investigation. B.C.K. was responsible for the data curation, supervision, writing and editing of manuscript.

Corresponding author

Correspondence to Byoung Chul Ko.

Ethics declarations

Conflict of interest/Competing interests

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S.B., Kim, S., Ahn, D. et al. BTD-RF: 3D scene reconstruction using block-term tensor decomposition. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05476-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05476-0

Keywords

Navigation