Visual SLAM for Mobile Robot

Peng, Gang; Lam, Tin Lun; Hu, Chunxu; Yao, Yu; Liu, Jintao; Yang, Fan

doi:10.1007/978-981-99-1814-0_10

Gang Peng ORCID: orcid.org/0000-0001-8801-0972⁷,
Tin Lun Lam⁸,
Chunxu Hu⁹,
Yu Yao¹⁰,
Jintao Liu¹¹ &
…
Fan Yang¹²

420 Accesses

Abstract

During the early developmental stage of visual SLAM technology, other sensor information was not considered, and only the camera was used. According to different tracking algorithms, VSLAM can be classified into direct and feature (or indirect) methods. If based on optimization algorithms, visual SLAM can be classified into filter-based optimization (e.g., MonoSLAM) and graph-based optimization (e.g., parallel tracking and mapping (PTAM)). When the image features of the pure vision-based SLAM are lost, the accuracy and robustness of the pose estimation decrease rapidly, and the algorithm may fail. Therefore, during subsequent development, visual SLAM techniques based on the fusion of multiple sensors, such as vision, IMU, and LiDAR, have emerged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel and J. D. Tardós. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Transactions on Robotics, 2021,37(6):1874-1890
2.
T. Shan, B. Englot, C. Ratti and D. Rus. LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping. 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 5692-5698.

References

Gao X, Zhang T, Liu Y, Yan Q (2017) Fourteen lectures on visual SLAM: From theory to practice. Electronic Industry Press. (first edition); August, 2019 (second edition)
Google Scholar
Raul M-A, Montiel JMM, Tardos JD (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163
Article Google Scholar
Raul M-A, Tardos JD (2017) ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras. IEEE Trans Robot 33(5):1255–1262
Article Google Scholar
Campos C, Elvira R, Gómez JJ, Rodríguez JMM, Montiel JD, Tardos. (2021) ORB-SLAM3: an accurate open-source library for visual visual-inertial and multi-map SLAM. IEEE Trans Robot 37(6):1874–1890. https://arxiv.org/abs/2007.11898v2
Article Google Scholar
Chen B (2021) Research on SLAM technology based on dynamic feature detection and multi-sensor fusion [D]. Huazhong University of Science and Technology
Google Scholar
Lu H (2020) Research on binocular vision inertial SLAM and navigation combined with LiDAR [D]. Huazhong University of Science and Technology
Google Scholar
Yang S, Scherer S (2019) Cubeslam: monocular 3-d object slam. IEEE Trans Robot 35(4):925–938
Article Google Scholar
Rehder J, Nikolic J, Schneider T, et al. Extending kalibr: calibrating the extrinsics of multiple IMUs and of individual axes. 2016 IEEE International Conference on Robotics and Automation (ICRA 2016), Stockholm, Sweden, pp: 4304–4311.
Google Scholar
Gao X-S, Hou X-R, Tang J, Cheng H-F (2003) Complete solution classification for the perspective-three-point problem. IEEE Trans Pattern Anal Mach Intell 25(8):930–943
Article Google Scholar
Labbé M, Michaud F (2019) RTAB-Map as an open-source Lidar and visual SLAM library for large-scale and long-term online operation. J Field Robot 36(2):416–446
Article Google Scholar
Labbé M, Michaud F (2018) Long-term online multi-session graph-based SPLAM with memory management. Autonomous Robots 42(6):1133–1150
Article Google Scholar
Davison AJ, Reid ID, Molton ND, Stasse O (2007) MonoSLAM: real-time single camera SLAM. IEEE Trans Pattern Anal Mach Intell 29(6):1052–1067
Article Google Scholar
Newcombe RA, Lovegrove SJ, Davison AJ. DTAM: Dense tracking and mapping in real-time. Proceedings of the 2011 International Conference on Computer Vision, 6-13 November, 2011, Barcelona, Spain.
Whelan T, Leutenegger S, Salas-Moreno RF, Glocker B, Davison AJ. ElasticFusion: dense SLAM without a pose graph. Proceedings of 2015 Robotics: Science and Systems Conference, 2015, Rome, Italy, pp 13–17
Google Scholar
Shan T, Englot B, Ratti C, Rus D. LVI-SAM: Tightly-coupled Lidar visual inertial odometry via smoothing and mapping. Proceedings of IEEE2021 International Conference on Robotics and Automation, 2021, pp:5692-5698, May 30-June 5, 2021, Xi'an China. https://arxiv.org/abs/2104.10831v2
Tateno K, Tombari F, Laina I, Navab N. CNN-SLAM: real-time dense monocular slam with learned depth prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21–26 July 2017, Hawaii USA.
Google Scholar
Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH. SLAM++: simultaneous localization and mapping at the level of objects. Proceedings of the IEEE conference on computer vision and pattern recognition. 23-28 June, 2013, Portland, Oregon, USA, pp 1352–1359.
Google Scholar
Nicholson L, Milford M, Sünderhauf N (2018) QuadricSLAM: dual quadrics from object detections as landmarks in object-oriented SLAM. IEEE Robot Autom Lett 4(1):1–8
Article Google Scholar
Gomez-Ojeda R, Briales J, Gonzalez-Jimenez J. PL-SVO: semi-direct monocular visual odometry by combining points and line segments. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. 9–14 October, 2016, Daejeon, South Korea, pp 4211-4216.
Google Scholar
Pumarola A, Vakhitov A, Agudo A, Sanfeliu A, Moreno-Noguer F. PL-SLAM: real-time monocular visual SLAM with points and lines. Proceedings of 2017 IEEE International Conference on Robotics and Automation. 29 May–3 June, 2017, Singapore, pp 4503-4508.
Google Scholar
Gomez-Ojeda R, Zuniga-Noël D, Moreno FA, Scaramuzza D, Gonzalez-Jimenez J (2019) PL-SLAM: a stereo SLAM system through the combination of points and line segments. IEEE Trans Robot 35(3):734–746
Article Google Scholar
Zheng F, Tsai G, Zhang Z, Liu S, Chu C-C, Hu H. Trifo-VIO: Robust and efficient stereo visual inertial odometry using points and lines. Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. 1–5 October 2018, Madrid, Spain, pp 3686-3693.
Google Scholar
Yijia H, Ji Z, Yue G, Wenhao H, Kui Y (2018) PL-VIO: tightly-coupled monocular visual-inertial odometry using point and line features. Sensors 18(4):1159–1183
Article Google Scholar
Wen H, Tian J, Li D. PLS-VIO: stereo vision-inertial odometry based on point and line features. Proceedings of 2020 International Conference on High Performance Big Data and Intelligent Systems. 23-23 May 2020, Shenzhen, China.
Google Scholar
Fu Q, Wang J, Yu H, Ali I, Guo F, Zhang H. PL-VINS: real-time monocular visual-inertial SLAM with point and line. arXiv preprint arXiv:2009.07462v1, 2020.
Google Scholar
Junsuk Lee，Soon-Yong Park. PLF-VINS: real-time monocular visual-inertial SLAM With point-line fusion and parallel-line fusion. IEEE Robot Autom Lett, 2021.
Google Scholar
Chen S (2021) Research on binocular vision inertial SLAM localization algorithm based on point-line features [D]. Huazhong University of Science and Technology
Google Scholar
Engel J, Sch¨ops T, Cremers D. LSD-SLAM: Large-scale direct monocular SLAM. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6-12 September, 2014, pp 834–849.
Google Scholar
Forster C, Pizzoli M, Scaramuzza D. SVO: fast semi-direct monocular visual odometry. Proceedings of IEEE International Conference on Robotics and Automation, May 31-June 7, 2014, Hong Kong, China, pp 15–22.
Google Scholar
Sumikura S, Shibuya M, Sakurada K. OpenVSLAM: A versatile visual slam framework. Proceedings of the 27th ACM International Conference on Multimedia. pp: 2292-2295, Nice, France, 2019.
Google Scholar
Tong Q, Peiliang L, Shaojie S (2018) VINS-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans Robot 34(4):1004–1020
Article Google Scholar
Shaozu C, Xiuyuan L, Shaojie S. GVINS: Tightly coupled GNSS-visual-inertial for smooth and consistent state estimation. arXiv e-prints, 2021: arXiv: 2103.07899.
Google Scholar
Tan Z (2021) Research on path tracking and sensing technology for unmanned system of bulldozer. Dissertation. Huazhong University of Science and Technology
Google Scholar

Download references

Author information

Authors and Affiliations

School of AI and Automation, Huazhong University of Science and Technology, Wuhan, Hubei, China
Gang Peng
Lab of Robot and AI, Chinese University of Hong Kong, Shenzhen, Guangdong, China
Tin Lun Lam
Research and Development Department, Wuhan Jingfeng Micro Control Technology Co., Limited, Wuhan, Hubei, China
Chunxu Hu
School of Computer Science, Wuhan University, Wuhan, Hubei, China
Yu Yao
Research and Development Department, YiKe Robot Lab, Nanjing, Jiangsu, China
Jintao Liu
Research and Development Department, Shenzhen NXROBO Co., Limited, Shenzhen, Guangdong, China
Fan Yang

Authors

Gang Peng
View author publications
You can also search for this author in PubMed Google Scholar
Tin Lun Lam
View author publications
You can also search for this author in PubMed Google Scholar
Chunxu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jintao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Peng .

Appendices

1.
ORB-SLAM3

A typical visual SLAM system mainly includes data processing, initialization, vision odometer, map maintenance, closed-loop detection, and other parts. Compared with the previous two generations, ORB-SLAM3^{Footnote 1} mainly adds inertial sensors on the basis of monocular, stereo vision and RBG-D and introduces vision-inertia and multi-map mode (Atlas) and map merging. It is a full multi-map and multi-session system able to work in pure vision or visual-inertial modes with monocular, stereo, or RGB-D sensors, using pin-hole and fisheye camera models. As shown in Fig 10.31, there have four main modules including Tracking thread, Loop and map merging thread, Local mapping thread, and Atlas.

Tracking thread processes sensor information and computes the pose of the current frame with respect to the active map in real-time, minimizing the re-projection error of the matched map features. It also decides whether the current frame becomes a key frame. In visual-inertial mode, the body velocity and IMU biases are estimated by optimization including the inertial residuals. When tracking is lost, the tracking thread tries to re-localize the current frame in all the Atlas’ maps. If re-localized, tracking is resumed, switching the active map if needed. Otherwise, after a certain time, the active map is stored as nonactive, and a new active map is initialized.

Loop and map merging thread detects common regions between the active map and the whole Atlas at key frame rate. If the common area belongs to the active map, it performs loop correction; if it belongs to a different map, both maps are seamlessly merged into a single one, which becomes the active map. After a loop correction, a full BA is launched in an independent thread to further refine the map without affecting real-time performance.

Local mapping thread adds key frames and points to the active map, removes the redundant ones, and refines the map using visual or visual-inertial bundle adjustment, operating in a local window of key frames close to the current frame. Additionally, in the inertial case, the IMU parameters are initialized and refined by the mapping thread using novel MAP-estimation technique.

Atlas is a multi-map representation composed of a set of disconnected maps. There is an active map where the tracking thread localizes the incoming frames, and is continuously optimized and grown with new key frames by the local mapping thread. The other maps in the Atlas are the nonactive maps. The system builds a unique DBoW2 database of key frames that is used for re-localization, loop closing, and map merging.
2.
LVI-SAM

LVI-SAM^{Footnote 2} is built atop a factor graph and is composed of two subsystems: a visual-inertial system (VIS) and a lidar-inertial system (LIS). The two subsystems are designed in a tightly coupled manner, in which the VIS leverages LIS estimation to facilitate initialization. The accuracy of the VIS is improved by extracting depth information for visual features using LiDAR measurements. In turn, the LIS utilizes VIS estimation for initial guesses to support scan-matching. Loop closures are first identified by the VIS and further refined by the LIS. LVI-SAM can also be effective when one of the two subsystems fails, which increases robustness in both texture-less and feature-less environments.

As shown in Fig 10.32, the VIS and LIS can function independently while using information from each other to increase system accuracy and robustness. The system outputs pose estimates at the IMU rate. The VIS processes images and IMU measurements, with LiDAR measurements being optional. Visual Odometry is obtained by minimizing the joint residuals of visual and IMU measurements. The LIS extracts LiDAR features and performs LiDAR Odometry by matching the extracted features with a feature map. The feature map is maintained in a sliding-window manner for real-time performance. Lastly, by jointly optimizing the contributions of IMU preintegration constraints, visual odometry constraints, LiDAR odometry constraints, and loop closure constraints in a factor graph using iSAM2, the state estimation problem is solved, which can be formulated as a maximum a posteriori (MAP) problem. Note that the multi-sensor graph optimization employed in the LIS is intended to reduce data exchange and improve system efficiency.

A flow diagram includes tracking with frame and I M U, keyframe, local mapping, loop and map merging with loop correction and place recognition, and full B A. The Atlas contains a D B o W 2 keyframe database, active map, and non-active map. — **Fig. 10.31**

A flow diagram includes a cyclic Lidar-aided visual-inertial odometry with feature extraction, and depth registration, and cyclic multi-sensor graph optimization with feature extraction and feature matching, with initial measurement from I M U, image from camera, and point cloud from Lidar. — **Fig. 10.32**

Exercises

1.
What are the optimization components used in the ORB-SLAM2 algorithm process?
2.
Are there any other optimization methods commonly used for back end optimization? Describe these methods.
3.
How are key frames selected?
4.
Learn about the DBoW2 library and try to find a few images to test if loop closures can be detected correctly. What other methods can be used for loop closure detection except for the bag-of-words model?
5.
How many representations of maps for visual SLAM are there? What application scenarios are they each suitable for?
6.
What other methods are available for solving for camera motion besides epipolar geometry and PnP? Describe these methods.
7.
In the visual SLAM algorithm, the pose change is calculated according to the matching feature points of adjacent frames. Assume that a point coordinate is (x, y, z), and rotate a, b, and c degrees according to the x-axis, y-axis, and z-axis, respectively. Please derive the corresponding rotation matrix R.
8.
There are many representations of rotation, including matrix representation, axis–angle representation, and quaternion representation. It is very important to understand the conversion relationship between different representations. Please derive Rodriguez formula, which shows how to express the rotation of axis–angle as a rotation matrix.
9.
When using direct linear transformation to solve PnP, if there are too many points, the coefficient matrix A in equation Ax = b will be overdetermined. Please prove that when A is overdetermined, the least square solution of Ax = b is x = (A^TA)^-1A^Tb.
10.
Briefly describe the steps of monocular dense mapping and RGB-D dense mapping. Explain their differences and connections.
11.
Briefly describe the application scenarios of no less than three common visual SLAM methods. Compare their advantages and disadvantages.
12.
The figure below contains the main functions of a typical mobile robot system in operation: pose estimation, environmental mapping, navigation, and obstacle avoidance. Analyze the individual ROS nodes shown in the figure below.

A flow diagram includes a high-performance P C, pose estimation and space environment mapping, and map-based autonomous navigation and obstacle avoidance on the left, and an embedded controller with motion control, R O S serial client node, and peripheral sensor data acquisition on the right.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Peng, G., Lam, T.L., Hu, C., Yao, Y., Liu, J., Yang, F. (2023). Visual SLAM for Mobile Robot. In: Introduction to Intelligent Robot System Design. Springer, Singapore. https://doi.org/10.1007/978-981-99-1814-0_10

Download citation

DOI: https://doi.org/10.1007/978-981-99-1814-0_10
Published: 05 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1813-3
Online ISBN: 978-981-99-1814-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Visual SLAM for Mobile Robot

Abstract

Access this chapter

Notes

References