Abstract
During the early developmental stage of visual SLAM technology, other sensor information was not considered, and only the camera was used. According to different tracking algorithms, VSLAM can be classified into direct and feature (or indirect) methods. If based on optimization algorithms, visual SLAM can be classified into filter-based optimization (e.g., MonoSLAM) and graph-based optimization (e.g., parallel tracking and mapping (PTAM)). When the image features of the pure vision-based SLAM are lost, the accuracy and robustness of the pose estimation decrease rapidly, and the algorithm may fail. Therefore, during subsequent development, visual SLAM techniques based on the fusion of multiple sensors, such as vision, IMU, and LiDAR, have emerged.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
C. Campos, R. Elvira, J. J. G. RodrÃguez, J. M. M. Montiel and J. D. Tardós. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Transactions on Robotics, 2021,37(6):1874-1890
- 2.
T. Shan, B. Englot, C. Ratti and D. Rus. LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping. 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 5692-5698.
References
Gao X, Zhang T, Liu Y, Yan Q (2017) Fourteen lectures on visual SLAM: From theory to practice. Electronic Industry Press. (first edition); August, 2019 (second edition)
Raul M-A, Montiel JMM, Tardos JD (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163
Raul M-A, Tardos JD (2017) ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras. IEEE Trans Robot 33(5):1255–1262
Campos C, Elvira R, Gómez JJ, RodrÃguez JMM, Montiel JD, Tardos. (2021) ORB-SLAM3: an accurate open-source library for visual visual-inertial and multi-map SLAM. IEEE Trans Robot 37(6):1874–1890. https://arxiv.org/abs/2007.11898v2
Chen B (2021) Research on SLAM technology based on dynamic feature detection and multi-sensor fusion [D]. Huazhong University of Science and Technology
Lu H (2020) Research on binocular vision inertial SLAM and navigation combined with LiDAR [D]. Huazhong University of Science and Technology
Yang S, Scherer S (2019) Cubeslam: monocular 3-d object slam. IEEE Trans Robot 35(4):925–938
Rehder J, Nikolic J, Schneider T, et al. Extending kalibr: calibrating the extrinsics of multiple IMUs and of individual axes. 2016 IEEE International Conference on Robotics and Automation (ICRA 2016), Stockholm, Sweden, pp: 4304–4311.
Gao X-S, Hou X-R, Tang J, Cheng H-F (2003) Complete solution classification for the perspective-three-point problem. IEEE Trans Pattern Anal Mach Intell 25(8):930–943
Labbé M, Michaud F (2019) RTAB-Map as an open-source Lidar and visual SLAM library for large-scale and long-term online operation. J Field Robot 36(2):416–446
Labbé M, Michaud F (2018) Long-term online multi-session graph-based SPLAM with memory management. Autonomous Robots 42(6):1133–1150
Davison AJ, Reid ID, Molton ND, Stasse O (2007) MonoSLAM: real-time single camera SLAM. IEEE Trans Pattern Anal Mach Intell 29(6):1052–1067
Newcombe RA, Lovegrove SJ, Davison AJ. DTAM: Dense tracking and mapping in real-time. Proceedings of the 2011 International Conference on Computer Vision, 6-13 November, 2011, Barcelona, Spain.
Whelan T, Leutenegger S, Salas-Moreno RF, Glocker B, Davison AJ. ElasticFusion: dense SLAM without a pose graph. Proceedings of 2015 Robotics: Science and Systems Conference, 2015, Rome, Italy, pp 13–17
Shan T, Englot B, Ratti C, Rus D. LVI-SAM: Tightly-coupled Lidar visual inertial odometry via smoothing and mapping. Proceedings of IEEE2021 International Conference on Robotics and Automation, 2021, pp:5692-5698, May 30-June 5, 2021, Xi'an China. https://arxiv.org/abs/2104.10831v2
Tateno K, Tombari F, Laina I, Navab N. CNN-SLAM: real-time dense monocular slam with learned depth prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21–26 July 2017, Hawaii USA.
Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH. SLAM++: simultaneous localization and mapping at the level of objects. Proceedings of the IEEE conference on computer vision and pattern recognition. 23-28 June, 2013, Portland, Oregon, USA, pp 1352–1359.
Nicholson L, Milford M, Sünderhauf N (2018) QuadricSLAM: dual quadrics from object detections as landmarks in object-oriented SLAM. IEEE Robot Autom Lett 4(1):1–8
Gomez-Ojeda R, Briales J, Gonzalez-Jimenez J. PL-SVO: semi-direct monocular visual odometry by combining points and line segments. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. 9–14 October, 2016, Daejeon, South Korea, pp 4211-4216.
Pumarola A, Vakhitov A, Agudo A, Sanfeliu A, Moreno-Noguer F. PL-SLAM: real-time monocular visual SLAM with points and lines. Proceedings of 2017 IEEE International Conference on Robotics and Automation. 29 May–3 June, 2017, Singapore, pp 4503-4508.
Gomez-Ojeda R, Zuniga-Noël D, Moreno FA, Scaramuzza D, Gonzalez-Jimenez J (2019) PL-SLAM: a stereo SLAM system through the combination of points and line segments. IEEE Trans Robot 35(3):734–746
Zheng F, Tsai G, Zhang Z, Liu S, Chu C-C, Hu H. Trifo-VIO: Robust and efficient stereo visual inertial odometry using points and lines. Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. 1–5 October 2018, Madrid, Spain, pp 3686-3693.
Yijia H, Ji Z, Yue G, Wenhao H, Kui Y (2018) PL-VIO: tightly-coupled monocular visual-inertial odometry using point and line features. Sensors 18(4):1159–1183
Wen H, Tian J, Li D. PLS-VIO: stereo vision-inertial odometry based on point and line features. Proceedings of 2020 International Conference on High Performance Big Data and Intelligent Systems. 23-23 May 2020, Shenzhen, China.
Fu Q, Wang J, Yu H, Ali I, Guo F, Zhang H. PL-VINS: real-time monocular visual-inertial SLAM with point and line. arXiv preprint arXiv:2009.07462v1, 2020.
Junsuk Lee,Soon-Yong Park. PLF-VINS: real-time monocular visual-inertial SLAM With point-line fusion and parallel-line fusion. IEEE Robot Autom Lett, 2021.
Chen S (2021) Research on binocular vision inertial SLAM localization algorithm based on point-line features [D]. Huazhong University of Science and Technology
Engel J, Sch¨ops T, Cremers D. LSD-SLAM: Large-scale direct monocular SLAM. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6-12 September, 2014, pp 834–849.
Forster C, Pizzoli M, Scaramuzza D. SVO: fast semi-direct monocular visual odometry. Proceedings of IEEE International Conference on Robotics and Automation, May 31-June 7, 2014, Hong Kong, China, pp 15–22.
Sumikura S, Shibuya M, Sakurada K. OpenVSLAM: A versatile visual slam framework. Proceedings of the 27th ACM International Conference on Multimedia. pp: 2292-2295, Nice, France, 2019.
Tong Q, Peiliang L, Shaojie S (2018) VINS-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans Robot 34(4):1004–1020
Shaozu C, Xiuyuan L, Shaojie S. GVINS: Tightly coupled GNSS-visual-inertial for smooth and consistent state estimation. arXiv e-prints, 2021: arXiv: 2103.07899.
Tan Z (2021) Research on path tracking and sensing technology for unmanned system of bulldozer. Dissertation. Huazhong University of Science and Technology
Author information
Authors and Affiliations
Corresponding author
Appendices
Further Reading
-
1.
ORB-SLAM3
A typical visual SLAM system mainly includes data processing, initialization, vision odometer, map maintenance, closed-loop detection, and other parts. Compared with the previous two generations, ORB-SLAM3Footnote 1 mainly adds inertial sensors on the basis of monocular, stereo vision and RBG-D and introduces vision-inertia and multi-map mode (Atlas) and map merging. It is a full multi-map and multi-session system able to work in pure vision or visual-inertial modes with monocular, stereo, or RGB-D sensors, using pin-hole and fisheye camera models. As shown in Fig 10.31, there have four main modules including Tracking thread, Loop and map merging thread, Local mapping thread, and Atlas.
Tracking thread processes sensor information and computes the pose of the current frame with respect to the active map in real-time, minimizing the re-projection error of the matched map features. It also decides whether the current frame becomes a key frame. In visual-inertial mode, the body velocity and IMU biases are estimated by optimization including the inertial residuals. When tracking is lost, the tracking thread tries to re-localize the current frame in all the Atlas’ maps. If re-localized, tracking is resumed, switching the active map if needed. Otherwise, after a certain time, the active map is stored as nonactive, and a new active map is initialized.
Loop and map merging thread detects common regions between the active map and the whole Atlas at key frame rate. If the common area belongs to the active map, it performs loop correction; if it belongs to a different map, both maps are seamlessly merged into a single one, which becomes the active map. After a loop correction, a full BA is launched in an independent thread to further refine the map without affecting real-time performance.
Local mapping thread adds key frames and points to the active map, removes the redundant ones, and refines the map using visual or visual-inertial bundle adjustment, operating in a local window of key frames close to the current frame. Additionally, in the inertial case, the IMU parameters are initialized and refined by the mapping thread using novel MAP-estimation technique.
Atlas is a multi-map representation composed of a set of disconnected maps. There is an active map where the tracking thread localizes the incoming frames, and is continuously optimized and grown with new key frames by the local mapping thread. The other maps in the Atlas are the nonactive maps. The system builds a unique DBoW2 database of key frames that is used for re-localization, loop closing, and map merging.
-
2.
LVI-SAM
LVI-SAMFootnote 2 is built atop a factor graph and is composed of two subsystems: a visual-inertial system (VIS) and a lidar-inertial system (LIS). The two subsystems are designed in a tightly coupled manner, in which the VIS leverages LIS estimation to facilitate initialization. The accuracy of the VIS is improved by extracting depth information for visual features using LiDAR measurements. In turn, the LIS utilizes VIS estimation for initial guesses to support scan-matching. Loop closures are first identified by the VIS and further refined by the LIS. LVI-SAM can also be effective when one of the two subsystems fails, which increases robustness in both texture-less and feature-less environments.
As shown in Fig 10.32, the VIS and LIS can function independently while using information from each other to increase system accuracy and robustness. The system outputs pose estimates at the IMU rate. The VIS processes images and IMU measurements, with LiDAR measurements being optional. Visual Odometry is obtained by minimizing the joint residuals of visual and IMU measurements. The LIS extracts LiDAR features and performs LiDAR Odometry by matching the extracted features with a feature map. The feature map is maintained in a sliding-window manner for real-time performance. Lastly, by jointly optimizing the contributions of IMU preintegration constraints, visual odometry constraints, LiDAR odometry constraints, and loop closure constraints in a factor graph using iSAM2, the state estimation problem is solved, which can be formulated as a maximum a posteriori (MAP) problem. Note that the multi-sensor graph optimization employed in the LIS is intended to reduce data exchange and improve system efficiency.
Exercises
-
1.
What are the optimization components used in the ORB-SLAM2 algorithm process?
-
2.
Are there any other optimization methods commonly used for back end optimization? Describe these methods.
-
3.
How are key frames selected?
-
4.
Learn about the DBoW2 library and try to find a few images to test if loop closures can be detected correctly. What other methods can be used for loop closure detection except for the bag-of-words model?
-
5.
How many representations of maps for visual SLAM are there? What application scenarios are they each suitable for?
-
6.
What other methods are available for solving for camera motion besides epipolar geometry and PnP? Describe these methods.
-
7.
In the visual SLAM algorithm, the pose change is calculated according to the matching feature points of adjacent frames. Assume that a point coordinate is (x, y, z), and rotate a, b, and c degrees according to the x-axis, y-axis, and z-axis, respectively. Please derive the corresponding rotation matrix R.
-
8.
There are many representations of rotation, including matrix representation, axis–angle representation, and quaternion representation. It is very important to understand the conversion relationship between different representations. Please derive Rodriguez formula, which shows how to express the rotation of axis–angle as a rotation matrix.
-
9.
When using direct linear transformation to solve PnP, if there are too many points, the coefficient matrix A in equation Ax = b will be overdetermined. Please prove that when A is overdetermined, the least square solution of Ax = b is x = (ATA)-1ATb.
-
10.
Briefly describe the steps of monocular dense mapping and RGB-D dense mapping. Explain their differences and connections.
-
11.
Briefly describe the application scenarios of no less than three common visual SLAM methods. Compare their advantages and disadvantages.
-
12.
The figure below contains the main functions of a typical mobile robot system in operation: pose estimation, environmental mapping, navigation, and obstacle avoidance. Analyze the individual ROS nodes shown in the figure below.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Peng, G., Lam, T.L., Hu, C., Yao, Y., Liu, J., Yang, F. (2023). Visual SLAM for Mobile Robot. In: Introduction to Intelligent Robot System Design. Springer, Singapore. https://doi.org/10.1007/978-981-99-1814-0_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-1814-0_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1813-3
Online ISBN: 978-981-99-1814-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)