Abstract
The Neural Radiance Field (NeRF) exhibits excellent performance for view synthesis tasks, but it requires a large amount of memory and model parameters during three-dimensional (3D) scene reconstruction. This paper proposes a block-term tensor decomposition radiance field (BTD-RF), which is a novel approach that achieves significant model compression while preserving reconstruction quality. BTD-RF decomposes high-dimensional radiance fields into low-dimensional tensor blocks, resulting in a value 2.21 times smaller than the baseline method. Decomposing the model into low-dimensional tensor blocks allows substituting the standard multi-head attention of transformers with a lightweight multi-linear attention mechanism, employing element-wise products and sharing parameters. This significantly reduces the model complexity without compromising performance. Extensive evaluations on various datasets demonstrate that BTD-RF achieves superior image reconstruction quality compared to prior methods. Quantitative metrics and qualitative assessments confirm that BTD-RF generates images that are structurally and perceptually close to ground truth, showcasing exceptional performance despite its lightweight design. BTD-RF offers a compelling trade-off between model size and reconstruction quality for three-dimensional (3D) scene reconstruction. Its efficient design makes it suitable for resource-constrained applications while delivering high-fidelity results, paving the way for broader NeRF utilization. The code is available at https://github.com/seonbin-kim/BTDRF
Similar content being viewed by others
Code availability
Researchers or interested parties are welcome to contact the corresponding author B.C.K. for further explanation, who may also provide the Python codes upon request.
References
Mildenhall B, Srinivasan P, Tancik M, Barron J, Ramamoorthi R, Ng R (2020) Nerf: Representing scenes as neural radiance fields for view synthesis. In: Proceedings of the European Conference on Computer Vision, pp. 1–17
Kerbl B, Kopanas G, Leimkühler T, Drettakis G (2023) 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42(4):1–14
Wang P, Liu Y, Chen Z, Liu L, Liu Z, Komura T, Theobalt C, Wang W (2023) F2-nerf: Fast neural radiance field training with free camera trajectories. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 4150–4159
Chen Z, Funkhouser T, Hedman P, Tagliasacchi A (2023) Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 16569–16578
Chen A, Xu Z, Geiger A, Yu J, Su H (2022) Tensorf: Tensorial radiance fields. In: Proceedings of the European Conference on Computer Vision, pp. 333–350
Yang J, Pavone M, Wang Y (2023) Freenerf: Improving few-shot neural rendering with free frequency regularization. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 8254–8263
Li X, Cao Z, Sun H, Zhang J, Xian K, Lin G (2023) 3d cinemagraphy from a single image. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 4595–4605
Niemeyer M, Barron JT, Mildenhall B, Sajjadi MS, Geiger A, Radwan N (2022) Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In: IEEE Conference on Computer Vision and Pattern Regognition, pp.5480–5490
Barron JT, Mildenhall B, Verbin D, Srinivasan PP, Hedman P (2023) Zipnerf: Anti-aliased grid-based neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19697–19705
Xiangli Y, Xu L, Pan X, Zhao N, Rao A, Theobalt C, Dai B, Lin D (2022) Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In: Proceedings of the European Conference on Computer Vision, pp.106–122
Barron JT, Mildenhall B, Verbin D, Srinivasan PP, Hedman P (2022) Mipnerf 360: Unbounded anti-aliased neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5470–5479
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM review 51(3):455–500
De Lathauwer L (2008) Decompositions of a higher-order tensor in block terms-part ii: Definitions and uniqueness. SIAM J Matrix Anal Appl 30(3):1033–1066
Yu A, Li R, Tancik M, Li H, Ng R, Kanazawa A (2021) Plenoctrees for real-time rendering of neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5752–5761
Fridovich-Keil S, Yu A, Tancik M, Chen Q, Recht B, Kanazawa A (2022) Plenoxels: Radiance fields without neural networks. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5501–5510
Reiser C, Peng S, Liao Y, Geiger A (2021) Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 14335–14345
Sun C, Sun M, Chen HT (2022) Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5459–5469
Müller T, Evans A, Schied C, Keller A (2022) Instant neural graphics primitives with a multiresolution hash encoding. Transactions on graphics 41(4):1–15
Yu A, Ye V, Tancik M, Kanazawa A (2021) pixelnerf: Neural radiance fields from one or few images. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 4578–4587
Deng K, Liu A, Zhu JY, Ramanan D (2022) Depth-supervised nerf: Fewer views and faster training for free. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 12882–12891
Verbin D, Hedman P, Mildenhall B, Zickler T, Barron JT, Srinivasan PP (2022) Ref-nerf: Structured view-dependent appearance for neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5481–5490
Varma M, Wang P, Chen X, Chen T, Venugopalan S, Wang Z (2023) Is attention all that nerf needs? In: International Conference on Learning Representations, pp. 1–22
Xu Q, Xu Z, Philip J, Bi S, Shu Z, Sunkavalli K, Neumann U (2022) Point-nerf: Point-based neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5438–5448
Kulhanek J, Sattler T (2023) Tetra-NeRF: Representing neural radiance fields using tetrahedra. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 18458–18469
Tang J, Chen X, Wang J, Zeng G (2022) Compressible-composable nerf via rank residual decomposition. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 14798–14809
Shi J, Guillemot C (2023) Light field compression via compact neural scene representation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1–5
Gordon C, Chng SF, MacDonald L, Lucey S (2023) On quantizing implicit neural representations. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 341–350
Barron JT, Mildenhall B, Tancik M, Hedman P, Martin-Brualla R, Srinivasan PP (2021) Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5855–5864
Fridovich-Keil S, Meanti G, Warburg FR, Recht B, Kanazawa A (2023) K-planes: Explicit radiance fields in space, time, and appearance. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 12479–12488
Carroll JD, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika 35(3):283–319
Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31(3):279–311
Ma X, Zhang P, Zhang S, Duan N, Hou Y, Zhou M, Song D (2019) A tensorized transformer for language modeling. Adv Neural Inf Process Syst 32:1–11
Wu L, Lee JY, Bhattad A, Wang YX, Forsyth D (2022) Diver: Real-time and accurate neural radiance fields with deterministic integration for volume rendering. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 16200–16209
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1–11
Kajiya JT (1986) The rendering equation. In: Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, pp. 143–150
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: International Conference on Learning Representations, pp. 1–11
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 586–595
Liu L, Gu J, Zaw Lin K, Chua TS, Theobalt C (2020) Neural sparse voxel fields. Adv Neural Inf Process Syst 33:15651–15663
Knapitsch A, Park J, Zhou QY, Koltun V (2017) Tanks and temples: Bench marking large-scale scene reconstruction. Transactions on graphics 36(4):1–13
Mildenhall B, Srinivasan PP, Ortiz-Cayon R, Kalantari NK, Ramamoorthi R, Ng R, Kar A (2019) Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. Transactions on graphics 38(4):1–14
Zhang Y, Huang X, Ni B, Li T, Zhang W (2023) Frequency-modulated point cloud rendering with easy editing. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 119–129
Li S, Li H, Wang Y, Liao Y, Yu L (2023) Steernerf: Accelerating nerf rendering via smooth viewpoint trajectory. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 20701–20711
Hu W, Wang Y, Ma L, Yang B, Gao L, Liu X, Ma Y (2023) Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19774–19783
Wu R, Mildenhall B, Henzler P, Park K, Gao R, Watson D, Srinivasan PP, Verbin D, Barron JT, Poole B, et al (2023) Reconfusion: 3d reconstruction with diffusion priors. arXiv:2312.02981
Funding
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2022R1I1A3058128).
Author information
Authors and Affiliations
Contributions
S.B.K., S.K. and D.A were responsible for the design and overall investigation. B.C.K. was responsible for the data curation, supervision, writing and editing of manuscript.
Corresponding author
Ethics declarations
Conflict of interest/Competing interests
The authors declare that they have no conflict of interest.
Ethics approval
Not applicable
Consent to participate
Not applicable
Consent for publication
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kim, S.B., Kim, S., Ahn, D. et al. BTD-RF: 3D scene reconstruction using block-term tensor decomposition. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05476-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05476-0