Abstract
In recent years, the performance of visual inertial odometry (VIO) based on deep learning has shown significant advantages over traditional geometric methods. However, all existing methods estimate each pose through visual and inertial measurements, which involves a large amount of computational redundancy, resulting in huge time costs and hardware damage when training and deploying on devices. In order to maintain accuracy while reducing the number of training parameters, an improved algorithm based on Visual-Selective-VIO is proposed. To reduce the number of network parameters and maintain the training accuracy, a unique attention mechanism is designed for the visual branch and a lightweight pose estimation module. By improving the visual branch, we serialize the information of attention feature maps, covering both channel and spatial dimensions. Then, we multiply these two feature maps with the original input feature maps for adaptive feature correction. This method improves the sensitivity of the model to channel features and enables more accurate image localization. Experimental results show that our algorithm maintains accuracy with a 10\(\%\) reduction in network parameters compared to advanced VIO algorithm, making it more suitable for training large-scale datasets and deployment in practical applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fetsch, C.R., Turner, A.H., DeAngelis, G.C., Angelaki, D.E.: Dynamic reweighting of visual and vestibular cues during self-motion perception. J. Neurosci. 29(49), 15601–15612 (2009)
Forster, C., Carlone, L., Dellaert, F., Scaramuzza, D.: Onmanifold preintegration for real-time visual Cinertial odometry. IEEE Trans. Rob. 33(1), 1–21 (2017)
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual Cinertial odometry using nonlinear optimization. Int. J. Robot. Res. 34(3), 314–334 (2015)
Li, M., Mourikis, A.I.: High-precision, consistent EKF based visual-inertial odometry. Int. J. Robot. Res. 32(6), 690–711 (2013)
Qin, T., Li, P., Shen, S.: VINS-MONO: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Rob. 34(4), 1004–1020 (2018)
Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: ViNet: visual-inertial odometry as a sequence-to-sequence learning problem. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Rob. 32(6), 1309–1332 (2016)
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2017)
Mur-Artal, R., Tard®s, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)
Chen, C., Rosa, S., Miao, Y., et al.: Selective sensor fusion for neural visual-inertial odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10542–10551 (2019)
Liu, L., Li, G., Li, T.H.: AtVio: attention guided visual-inertial odometry. In ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4125–4129. IEEE (2021)
Shamwell, E.J., Leung, S., Nothwang, W.D.: Vision-aided absolute trajectory estimation using an unsupervised deep network with online error correction. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2524–2531. IEEE (2018)
Han, L., Lin, Y., Du, G., Lian, S.: Deepvio: self-supervised deep learning of monocular visual inertial odometry using 3D geometric constraints. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6906–6913. IEEE (2019)
Almalioglu, Yasin, et al.: SelfVIO: self-supervised deep monocular Visual CInertial Odometry and depth estimation, pp. 119–136. Neural Networks, 150 (2022)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Simonyan K, Zisserman A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Yang, M., Chen, Y., Kim, H.S.: Efficient deep visual and inertial odometry with adaptive visual modality selection. In: Computer Vision CECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, pp. 233–250. Proceedings, Part XXXVIII (2022)
Mourikis, A.I., Roumeliotis, S.I.: A multi-state constraint Kalman filter for vision-aided inertial navigation. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 3565–3572. IEEE (2007)
Bloesch, M., Omari, S., Hutter, M., Siegwart, R.: Robust visual inertial odometry using a direct EKF-based approach. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 298–304. IEEE (2015)
Leutenegger, S., Furgale, P., Rabaud, V., et al.: Keyframe-based visual-inertial slam using nonlinear optimization. In: Proceedings of Robotis Science and Systems (RSS) 2013 (2013)
Chen, C., et al.: Selective sensor fusion for neural visual-inertial odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10542–10551 (2019)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Forster, C., Carlone, L., Dellaert, F., Scaramuzza, D.: IMU preintegration on manifold for efficient visual-inertial maximum-a-posteriori estimation. In: Robotics: Science and Systems XI (2015)
Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)
Acknowledgement
This work was supported by the Youth Foundations of Shandong Province under Grant Nos. ZR202102230323 and ZR2021QF130, the National Natural Science Foundation of China under Grant No. 62273163, and the Key R & D Project of Shandong Province under Grant No. 2022CXGC010503.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lu, Y., Yin, X., Qin, F., Huang, K., Zhang, M., Huang, W. (2023). A Lightweight Sensor Fusion for Neural Visual Inertial Odometry. In: Zhang, H., et al. International Conference on Neural Computing for Advanced Applications. NCAA 2023. Communications in Computer and Information Science, vol 1870. Springer, Singapore. https://doi.org/10.1007/978-981-99-5847-4_4
Download citation
DOI: https://doi.org/10.1007/978-981-99-5847-4_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5846-7
Online ISBN: 978-981-99-5847-4
eBook Packages: Computer ScienceComputer Science (R0)