BTD-RF: 3D scene reconstruction using block-term tensor decomposition

Kim, Seon Bin; Kim, Sangwon; Ahn, Dasom; Ko, Byoung Chul

doi:10.1007/s10489-024-05476-0

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

Published: 09 May 2024

(2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Seon Bin Kim¹,
Sangwon Kim²,
Dasom Ahn¹ &
…
Byoung Chul Ko ORCID: orcid.org/0000-0002-7284-0768¹

145 Accesses
Explore all metrics

Abstract

The Neural Radiance Field (NeRF) exhibits excellent performance for view synthesis tasks, but it requires a large amount of memory and model parameters during three-dimensional (3D) scene reconstruction. This paper proposes a block-term tensor decomposition radiance field (BTD-RF), which is a novel approach that achieves significant model compression while preserving reconstruction quality. BTD-RF decomposes high-dimensional radiance fields into low-dimensional tensor blocks, resulting in a value 2.21 times smaller than the baseline method. Decomposing the model into low-dimensional tensor blocks allows substituting the standard multi-head attention of transformers with a lightweight multi-linear attention mechanism, employing element-wise products and sharing parameters. This significantly reduces the model complexity without compromising performance. Extensive evaluations on various datasets demonstrate that BTD-RF achieves superior image reconstruction quality compared to prior methods. Quantitative metrics and qualitative assessments confirm that BTD-RF generates images that are structurally and perceptually close to ground truth, showcasing exceptional performance despite its lightweight design. BTD-RF offers a compelling trade-off between model size and reconstruction quality for three-dimensional (3D) scene reconstruction. Its efficient design makes it suitable for resource-constrained applications while delivering high-fidelity results, paving the way for broader NeRF utilization. The code is available at https://github.com/seonbin-kim/BTDRF

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

STATE: Learning structure and texture representations for novel view synthesis

Article Open access 11 July 2023

MVD-NeRF: Resolving Shape-Radiance Ambiguity via Mitigating View Dependency

TensoRF: Tensorial Radiance Fields

Availability of data and materials

The data that support the findings of this study are available on request from the corresponding author of references [1, 21, 39,40,41].

Code availability

Researchers or interested parties are welcome to contact the corresponding author B.C.K. for further explanation, who may also provide the Python codes upon request.

References

Mildenhall B, Srinivasan P, Tancik M, Barron J, Ramamoorthi R, Ng R (2020) Nerf: Representing scenes as neural radiance fields for view synthesis. In: Proceedings of the European Conference on Computer Vision, pp. 1–17
Kerbl B, Kopanas G, Leimkühler T, Drettakis G (2023) 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42(4):1–14
Article Google Scholar
Wang P, Liu Y, Chen Z, Liu L, Liu Z, Komura T, Theobalt C, Wang W (2023) F2-nerf: Fast neural radiance field training with free camera trajectories. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 4150–4159
Chen Z, Funkhouser T, Hedman P, Tagliasacchi A (2023) Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 16569–16578
Chen A, Xu Z, Geiger A, Yu J, Su H (2022) Tensorf: Tensorial radiance fields. In: Proceedings of the European Conference on Computer Vision, pp. 333–350
Yang J, Pavone M, Wang Y (2023) Freenerf: Improving few-shot neural rendering with free frequency regularization. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 8254–8263
Li X, Cao Z, Sun H, Zhang J, Xian K, Lin G (2023) 3d cinemagraphy from a single image. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 4595–4605
Niemeyer M, Barron JT, Mildenhall B, Sajjadi MS, Geiger A, Radwan N (2022) Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In: IEEE Conference on Computer Vision and Pattern Regognition, pp.5480–5490
Barron JT, Mildenhall B, Verbin D, Srinivasan PP, Hedman P (2023) Zipnerf: Anti-aliased grid-based neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19697–19705
Xiangli Y, Xu L, Pan X, Zhao N, Rao A, Theobalt C, Dai B, Lin D (2022) Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In: Proceedings of the European Conference on Computer Vision, pp.106–122
Barron JT, Mildenhall B, Verbin D, Srinivasan PP, Hedman P (2022) Mipnerf 360: Unbounded anti-aliased neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5470–5479
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM review 51(3):455–500
Google Scholar
De Lathauwer L (2008) Decompositions of a higher-order tensor in block terms-part ii: Definitions and uniqueness. SIAM J Matrix Anal Appl 30(3):1033–1066
Article MathSciNet Google Scholar
Yu A, Li R, Tancik M, Li H, Ng R, Kanazawa A (2021) Plenoctrees for real-time rendering of neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5752–5761
Fridovich-Keil S, Yu A, Tancik M, Chen Q, Recht B, Kanazawa A (2022) Plenoxels: Radiance fields without neural networks. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5501–5510
Reiser C, Peng S, Liao Y, Geiger A (2021) Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 14335–14345
Sun C, Sun M, Chen HT (2022) Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5459–5469
Müller T, Evans A, Schied C, Keller A (2022) Instant neural graphics primitives with a multiresolution hash encoding. Transactions on graphics 41(4):1–15
Article Google Scholar
Yu A, Ye V, Tancik M, Kanazawa A (2021) pixelnerf: Neural radiance fields from one or few images. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 4578–4587
Deng K, Liu A, Zhu JY, Ramanan D (2022) Depth-supervised nerf: Fewer views and faster training for free. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 12882–12891
Verbin D, Hedman P, Mildenhall B, Zickler T, Barron JT, Srinivasan PP (2022) Ref-nerf: Structured view-dependent appearance for neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5481–5490
Varma M, Wang P, Chen X, Chen T, Venugopalan S, Wang Z (2023) Is attention all that nerf needs? In: International Conference on Learning Representations, pp. 1–22
Xu Q, Xu Z, Philip J, Bi S, Shu Z, Sunkavalli K, Neumann U (2022) Point-nerf: Point-based neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 5438–5448
Kulhanek J, Sattler T (2023) Tetra-NeRF: Representing neural radiance fields using tetrahedra. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 18458–18469
Tang J, Chen X, Wang J, Zeng G (2022) Compressible-composable nerf via rank residual decomposition. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 14798–14809
Shi J, Guillemot C (2023) Light field compression via compact neural scene representation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1–5
Gordon C, Chng SF, MacDonald L, Lucey S (2023) On quantizing implicit neural representations. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 341–350
Barron JT, Mildenhall B, Tancik M, Hedman P, Martin-Brualla R, Srinivasan PP (2021) Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5855–5864
Fridovich-Keil S, Meanti G, Warburg FR, Recht B, Kanazawa A (2023) K-planes: Explicit radiance fields in space, time, and appearance. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 12479–12488
Carroll JD, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika 35(3):283–319
Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31(3):279–311
Article MathSciNet Google Scholar
Ma X, Zhang P, Zhang S, Duan N, Hou Y, Zhou M, Song D (2019) A tensorized transformer for language modeling. Adv Neural Inf Process Syst 32:1–11
Google Scholar
Wu L, Lee JY, Bhattad A, Wang YX, Forsyth D (2022) Diver: Real-time and accurate neural radiance fields with deterministic integration for volume rendering. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 16200–16209
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1–11
Google Scholar
Kajiya JT (1986) The rendering equation. In: Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, pp. 143–150
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: International Conference on Learning Representations, pp. 1–11
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 586–595
Liu L, Gu J, Zaw Lin K, Chua TS, Theobalt C (2020) Neural sparse voxel fields. Adv Neural Inf Process Syst 33:15651–15663
Google Scholar
Knapitsch A, Park J, Zhou QY, Koltun V (2017) Tanks and temples: Bench marking large-scale scene reconstruction. Transactions on graphics 36(4):1–13
Article Google Scholar
Mildenhall B, Srinivasan PP, Ortiz-Cayon R, Kalantari NK, Ramamoorthi R, Ng R, Kar A (2019) Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. Transactions on graphics 38(4):1–14
Article Google Scholar
Zhang Y, Huang X, Ni B, Li T, Zhang W (2023) Frequency-modulated point cloud rendering with easy editing. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 119–129
Li S, Li H, Wang Y, Liao Y, Yu L (2023) Steernerf: Accelerating nerf rendering via smooth viewpoint trajectory. In: IEEE Conference on Computer Vision and Pattern Regognition, pp. 20701–20711
Hu W, Wang Y, Ma L, Yang B, Gao L, Liu X, Ma Y (2023) Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19774–19783
Wu R, Mildenhall B, Henzler P, Park K, Gao R, Watson D, Srinivasan PP, Verbin D, Barron JT, Poole B, et al (2023) Reconfusion: 3d reconstruction with diffusion priors. arXiv:2312.02981

Download references

Funding

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2022R1I1A3058128).

Author information

Authors and Affiliations

Department of Computer Engineering, Keimyung University, Daegu, 42601, South Korea
Seon Bin Kim, Dasom Ahn & Byoung Chul Ko
Electronics and Telecommunications Research Institute (ETRI), Daegu, 42994, South Korea
Sangwon Kim

Authors

Seon Bin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sangwon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dasom Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Byoung Chul Ko
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.B.K., S.K. and D.A were responsible for the design and overall investigation. B.C.K. was responsible for the data curation, supervision, writing and editing of manuscript.

Corresponding author

Correspondence to Byoung Chul Ko.

Ethics declarations

Conflict of interest/Competing interests

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kim, S.B., Kim, S., Ahn, D. et al. BTD-RF: 3D scene reconstruction using block-term tensor decomposition. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05476-0

Download citation

Accepted: 19 April 2024
Published: 09 May 2024
DOI: https://doi.org/10.1007/s10489-024-05476-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

Abstract

Access this article

Similar content being viewed by others

STATE: Learning structure and texture representations for novel view synthesis

MVD-NeRF: Resolving Shape-Radiance Ambiguity via Mitigating View Dependency

TensoRF: Tensorial Radiance Fields

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Competing interests

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

Abstract

Access this article

Similar content being viewed by others

STATE: Learning structure and texture representations for novel view synthesis

MVD-NeRF: Resolving Shape-Radiance Ambiguity via Mitigating View Dependency

TensoRF: Tensorial Radiance Fields

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Competing interests

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation