VI-NeRF-SLAM: a real-time visual–inertial SLAM with NeRF mapping

Liao, DaoQing; Ai, Wei

doi:10.1007/s11554-023-01412-6

VI-NeRF-SLAM: a real-time visual–inertial SLAM with NeRF mapping

Research
Published: 09 February 2024

Volume 21, article number 30, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

DaoQing Liao¹ &
Wei Ai¹

551 Accesses
Explore all metrics

Abstract

In numerous robotic and autonomous driving tasks, traditional visual SLAM algorithms estimate the camera’s position in a scene through sparse feature points and express the map by estimating the depth of sparse point clouds. However, practical applications require SLAM to create dense maps in real time, overcoming the sparsity and occlusion issues of point clouds. Furthermore, it is advantageous for SLAM map to possess an auto-completion capability, where the map can automatically infer and complete the remaining 20% when the camera observes only 80% of an object. Therefore, a more dense and intelligent map representation is needed. In this paper, we propose a Visual–Inertial SLAM with Neural Radiance Fields reconstruction to address the aforementioned challenges. We integrate the traditional rule-based optimization with NeRF. This approach allows for the real-time update of NeRF local functions by rapidly estimating camera motion and sparse feature point depths to reconstruct 3D scenes. To achieve better camera poses and globally consistent map, we address the issue of IMU noise spikes resulting from rapid motion changes, along with handling pose adjustments due to loop closure fusion. Specifically, we employ a form of widening the static noise covariance to refit the dynamic noise covariance. During loop closure fusion, we treat the pose adjustment between pre- and post-loop closure as a spatiotemporal transformation, migrating NeRF parameters from pre- to post- to expedite loop closure adjustments in NeRF mapping. Moreover, we extend this method to scenarios with only grayscale images. By expanding the color channels of grayscale images and conducting linear spatial mapping, we can rapidly reconstruct 3D scenes with only grayscale images. We demonstrate the precision and speed advantages of our method in both RGB and grayscale scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Monocular Dense SLAM with Consistent Deep Depth Prediction

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

Article 08 September 2018

Semantic Ground Plane Constraint in Visual SLAM for Indoor Scenes

Data availability

The data that support the fndings of this study are available from the corresponding author, upon reasonable request.

References

Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. ICCV (2021)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: unbounded anti-aliased neural radiance fields. CVPR (2022)
Bhalgat, Y., Laina, I., Henriques, J.F., Zisserman, A., Vedaldi, A.: Contrastive lift: 3D object instance segmentation by slow-fast contrastive fusion. Preprint arXiv:2306.04633 (2023)
Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., Achtelik, M.W., Siegwart, R.: The Euroc micro aerial vehicle datasets. Int. J. Robot. Res. (2016). https://doi.org/10.1177/0278364915620033. https://ijr.sagepub.com/content/early/2016/01/21/0278364915620033.abstract
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans. Rob. 37(6), 1874–1890 (2021)
Article Google Scholar
Chen, A., Xu, Z., Zhao, F., Zhang, X., Xiang, F., Yu, J., Su, H.: Mvsnerf: fast generalizable radiance field reconstruction from multi-view stereo, pp. 14124–14133 (2021)
Chen, Z.: Im-net: learning implicit fields for generative shape modeling (2019)
Chung, C.M., Tseng, Y.C., Hsu, Y.C., Shi, X.Q., Hua, Y.H., Yeh, J.F., Chen, W.C., Chen, Y.T., Hsu, W.H.: Orbeez-slam: a real-time monocular visual slam with orb features and nerf-realized mapping. Preprint arXiv:2209.13274 (2022)
Clark, R.: Volumetric bundle adjustment for online photorealistic scene capture, pp. 6124–6132 (2022)
Crassidis, J.L.: Sigma-point Kalman filtering for integrated GPS and inertial navigation. IEEE Trans. Aerosp. Electron. Syst. 42(2), 750–756 (2006)
Article Google Scholar
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans Graph (ToG) 36(4), 1 (2017)
Article Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Forster, C., Carlone, L., Dellaert, F., Scaramuzza, D.: On-manifold preintegration for real-time visual-inertial odometry. IEEE Trans. Rob. 33(1), 1–21 (2016)
Article Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency, pp. 270–279 (2017)
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation, pp. 3828–3838 (2019)
Koestler, L., Yang, N., Zeller, N., Cremers, D.: Tandem: tracking and dense mapping in real-time using deep multi-view stereo, pp. 34–45 (2022)
Leutenegger, S., Furgale, P., Rabaud, V., Chli, M., Konolige, K., Siegwart, R.: Keyframe-based visual-inertial slam using nonlinear optimization. Proc. Robot. Sci. Syst. (RSS) 2013, 1 (2013)
Google Scholar
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res. 34(3), 314–334 (2015)
Article Google Scholar
Li, J., Feng, Z., She, Q., Ding, H., Wang, C., Lee, G.H.: Mine: Towards continuous depth MPI with nerf for novel view synthesis, pp. 12578–12588 (2021)
Li, M., Mourikis, A.I.: High-precision, consistent EKF-based visual-inertial odometry. Int. J. Robot. Res. 32(6), 690–711 (2013)
Article Google Scholar
Li, Z., Wang, Q., Cole, F., Tucker, R., Snavely, N.: Dynibar: neural dynamic image-based rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4273–4284 (2023)
Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: Barf: bundle-adjusting neural radiance fields (2021)
Lindenberger, P., Sarlin, P.E., Pollefeys, M.: Lightglue: local feature matching at light speed. Preprint arXiv:2306.13643 (2023)
Lupton, T., Sukkarieh, S.: Visual–inertial-aided navigation for high-dynamic motion in built environments without initial conditions. IEEE Trans. Robot. (2011). https://doi.org/10.1109/tro.2011.2170332. http://dx.doi.org/10.1109/tro.2011.2170332
Martin-Brualla, R., Radwan, N., Sajjadi, M.S.M., Barron, J.T., Dosovitskiy, A., Duckworth, D.: NeRF in the wild: neural radiance fields for unconstrained photo collections (2021)
Meng, X., Chen, W., Yang, B.: Neat: learning neural implicit surfaces with arbitrary topologies from multi-view images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 248–258 (2023)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Mourikis, A.I., Roumeliotis, S.I.: A multi-state constraint Kalman filter for vision-aided inertial navigation, pp. 3565–3572 (2007)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (ToG) 41(4), 1–15 (2022)
Article Google Scholar
Ortiz, J., Clegg, A., Dong, J., Sucar, E., Novotny, D., Zollhoefer, M., Mukadam, M.: isdf: Real-time neural signed distance fields for robot perception. Preprint arXiv:2204.02296 (2022)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation, pp. 165–174 (2019)
Paul, M.K., Roumeliotis, S.I.: Alternating-stereo vins: observability analysis and performance evaluation, pp. 4729–4737 (2018)
Paul, M.K., Wu, K., Hesch, J.A., Nerurkar, E.D., Roumeliotis, S.I.: A comparative analysis of tightly-coupled monocular, binocular, and stereo vins, pp. 165–172 (2017)
Prisacariu, V.A., Kähler, O., Golodetz, S., Sapienza, M., Cavallari, T., Torr, P.H., Murray, D.W.: Infinitam v3: A framework for large-scale 3d reconstruction with loop closure. arXiv preprint arXiv:1708.00783 (2017)
Qin, T., Li, P., Shen, S.: Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Rob. 34(4), 1004–1020 (2018)
Article Google Scholar
Qin, T., Pan, J., Cao, S., Shen, S.: A general optimization-based framework for local odometry estimation with multiple sensors. Preprint arXiv:1901.03638 (2019)
Rosinol, A., Leonard, J.J., Carlone, L.: NeRF-SLAM: real-time dense monocular slam with neural radiance fields. Preprint arXiv:2210.13641 (2022)
Straub, J., Whelan, T., Ma, L., Chen, Y., Wijmans, E., Green, S., Engel, J.J., Mur-Artal, R., Ren, C., Verma, S., et al.: The replica dataset: a digital replica of indoor spaces. Preprint arXiv:1906.05797 (2019)
Sucar, E., Liu, S., Ortiz, J., Davison, A.J.: imap: implicit mapping and positioning in real-time, pp. 6229–6238 (2021)
Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction. CVPR (2022)
Teed, Z., Deng, J.: Droid-slam: deep visual slam for monocular, stereo, and RGB-D cameras. Adv. Neural. Inf. Process. Syst. 34, 16558–16569 (2021)
Google Scholar
Tretschk, E., Tewari, A., Golyanik, V., Zollhöfer, M., Lassner, C., Theobalt, C.: Non-rigid neural radiance fields: reconstruction and novel view synthesis of a dynamic scene from monocular video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12959–12970 (2021)
Wang, P., Liu, Y., Chen, Z., Liu, L., Liu, Z., Komura, T., Theobalt, C., Wang, W.: F\(2\)-nerf: fast neural radiance field training with free camera trajectories. Preprint arXiv:2303.15951 (2023)
Wang, Y., Han, Q., Habermann, M., Daniilidis, K., Theobalt, C., Liu, L.: Neus2: fast learning of neural implicit surfaces for multi-view reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3295–3306 (2023)
Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A.: Elasticfusion: Dense slam without a pose graph (2015)
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: inerf: inverting neural radiance fields for pose estimation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1323–1330. IEEE (2021)
Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. Preprint arXiv:2112.05131 (2021)
Zhi, S., Laidlow, T., Leutenegger, S., Davison, A.J.: In-place scene labelling and understanding with implicit scene representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15838–15847 (2021)
Zhu, Z., Peng, S., Larsson, V., Xu, W., Bao, H., Cui, Z., Oswald, M.R., Pollefeys, M.: Nice-slam: neural implicit scalable encoding for slam, pp. 12786–12796 (2022)

Download references

Author information

Authors and Affiliations

School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
DaoQing Liao & Wei Ai

Authors

DaoQing Liao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to DaoQing Liao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liao, D., Ai, W. VI-NeRF-SLAM: a real-time visual–inertial SLAM with NeRF mapping. J Real-Time Image Proc 21, 30 (2024). https://doi.org/10.1007/s11554-023-01412-6

Download citation

Received: 05 December 2023
Accepted: 30 December 2023
Published: 09 February 2024
DOI: https://doi.org/10.1007/s11554-023-01412-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VI-NeRF-SLAM: a real-time visual–inertial SLAM with NeRF mapping

Abstract

Access this article

Similar content being viewed by others

Monocular Dense SLAM with Consistent Deep Depth Prediction

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

Semantic Ground Plane Constraint in Visual SLAM for Indoor Scenes

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VI-NeRF-SLAM: a real-time visual–inertial SLAM with NeRF mapping

Abstract

Access this article

Similar content being viewed by others

Monocular Dense SLAM with Consistent Deep Depth Prediction

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

Semantic Ground Plane Constraint in Visual SLAM for Indoor Scenes

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation