Skip to main content

Visual SLAM for Mobile Robot

  • Chapter
  • First Online:
Introduction to Intelligent Robot System Design

Abstract

During the early developmental stage of visual SLAM technology, other sensor information was not considered, and only the camera was used. According to different tracking algorithms, VSLAM can be classified into direct and feature (or indirect) methods. If based on optimization algorithms, visual SLAM can be classified into filter-based optimization (e.g., MonoSLAM) and graph-based optimization (e.g., parallel tracking and mapping (PTAM)). When the image features of the pure vision-based SLAM are lost, the accuracy and robustness of the pose estimation decrease rapidly, and the algorithm may fail. Therefore, during subsequent development, visual SLAM techniques based on the fusion of multiple sensors, such as vision, IMU, and LiDAR, have emerged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel and J. D. Tardós. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Transactions on Robotics, 2021,37(6):1874-1890

  2. 2.

    T. Shan, B. Englot, C. Ratti and D. Rus. LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping. 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 5692-5698.

References

  1. Gao X, Zhang T, Liu Y, Yan Q (2017) Fourteen lectures on visual SLAM: From theory to practice. Electronic Industry Press. (first edition); August, 2019 (second edition)

    Google Scholar 

  2. Raul M-A, Montiel JMM, Tardos JD (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163

    Article  Google Scholar 

  3. Raul M-A, Tardos JD (2017) ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras. IEEE Trans Robot 33(5):1255–1262

    Article  Google Scholar 

  4. Campos C, Elvira R, Gómez JJ, Rodríguez JMM, Montiel JD, Tardos. (2021) ORB-SLAM3: an accurate open-source library for visual visual-inertial and multi-map SLAM. IEEE Trans Robot 37(6):1874–1890. https://arxiv.org/abs/2007.11898v2

    Article  Google Scholar 

  5. Chen B (2021) Research on SLAM technology based on dynamic feature detection and multi-sensor fusion [D]. Huazhong University of Science and Technology

    Google Scholar 

  6. Lu H (2020) Research on binocular vision inertial SLAM and navigation combined with LiDAR [D]. Huazhong University of Science and Technology

    Google Scholar 

  7. Yang S, Scherer S (2019) Cubeslam: monocular 3-d object slam. IEEE Trans Robot 35(4):925–938

    Article  Google Scholar 

  8. Rehder J, Nikolic J, Schneider T, et al. Extending kalibr: calibrating the extrinsics of multiple IMUs and of individual axes. 2016 IEEE International Conference on Robotics and Automation (ICRA 2016), Stockholm, Sweden, pp: 4304–4311.

    Google Scholar 

  9. Gao X-S, Hou X-R, Tang J, Cheng H-F (2003) Complete solution classification for the perspective-three-point problem. IEEE Trans Pattern Anal Mach Intell 25(8):930–943

    Article  Google Scholar 

  10. Labbé M, Michaud F (2019) RTAB-Map as an open-source Lidar and visual SLAM library for large-scale and long-term online operation. J Field Robot 36(2):416–446

    Article  Google Scholar 

  11. Labbé M, Michaud F (2018) Long-term online multi-session graph-based SPLAM with memory management. Autonomous Robots 42(6):1133–1150

    Article  Google Scholar 

  12. Davison AJ, Reid ID, Molton ND, Stasse O (2007) MonoSLAM: real-time single camera SLAM. IEEE Trans Pattern Anal Mach Intell 29(6):1052–1067

    Article  Google Scholar 

  13. Newcombe RA, Lovegrove SJ, Davison AJ. DTAM: Dense tracking and mapping in real-time. Proceedings of the 2011 International Conference on Computer Vision, 6-13 November, 2011, Barcelona, Spain.

  14. Whelan T, Leutenegger S, Salas-Moreno RF, Glocker B, Davison AJ. ElasticFusion: dense SLAM without a pose graph. Proceedings of 2015 Robotics: Science and Systems Conference, 2015, Rome, Italy, pp 13–17

    Google Scholar 

  15. Shan T, Englot B, Ratti C, Rus D. LVI-SAM: Tightly-coupled Lidar visual inertial odometry via smoothing and mapping. Proceedings of IEEE2021 International Conference on Robotics and Automation, 2021, pp:5692-5698, May 30-June 5, 2021, Xi'an China. https://arxiv.org/abs/2104.10831v2

  16. Tateno K, Tombari F, Laina I, Navab N. CNN-SLAM: real-time dense monocular slam with learned depth prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21–26 July 2017, Hawaii USA.

    Google Scholar 

  17. Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH. SLAM++: simultaneous localization and mapping at the level of objects. Proceedings of the IEEE conference on computer vision and pattern recognition. 23-28 June, 2013, Portland, Oregon, USA, pp 1352–1359.

    Google Scholar 

  18. Nicholson L, Milford M, Sünderhauf N (2018) QuadricSLAM: dual quadrics from object detections as landmarks in object-oriented SLAM. IEEE Robot Autom Lett 4(1):1–8

    Article  Google Scholar 

  19. Gomez-Ojeda R, Briales J, Gonzalez-Jimenez J. PL-SVO: semi-direct monocular visual odometry by combining points and line segments. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. 9–14 October, 2016, Daejeon, South Korea, pp 4211-4216.

    Google Scholar 

  20. Pumarola A, Vakhitov A, Agudo A, Sanfeliu A, Moreno-Noguer F. PL-SLAM: real-time monocular visual SLAM with points and lines. Proceedings of 2017 IEEE International Conference on Robotics and Automation. 29 May–3 June, 2017, Singapore, pp 4503-4508.

    Google Scholar 

  21. Gomez-Ojeda R, Zuniga-Noël D, Moreno FA, Scaramuzza D, Gonzalez-Jimenez J (2019) PL-SLAM: a stereo SLAM system through the combination of points and line segments. IEEE Trans Robot 35(3):734–746

    Article  Google Scholar 

  22. Zheng F, Tsai G, Zhang Z, Liu S, Chu C-C, Hu H. Trifo-VIO: Robust and efficient stereo visual inertial odometry using points and lines. Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. 1–5 October 2018, Madrid, Spain, pp 3686-3693.

    Google Scholar 

  23. Yijia H, Ji Z, Yue G, Wenhao H, Kui Y (2018) PL-VIO: tightly-coupled monocular visual-inertial odometry using point and line features. Sensors 18(4):1159–1183

    Article  Google Scholar 

  24. Wen H, Tian J, Li D. PLS-VIO: stereo vision-inertial odometry based on point and line features. Proceedings of 2020 International Conference on High Performance Big Data and Intelligent Systems. 23-23 May 2020, Shenzhen, China.

    Google Scholar 

  25. Fu Q, Wang J, Yu H, Ali I, Guo F, Zhang H. PL-VINS: real-time monocular visual-inertial SLAM with point and line. arXiv preprint arXiv:2009.07462v1, 2020.

    Google Scholar 

  26. Junsuk Lee,Soon-Yong Park. PLF-VINS: real-time monocular visual-inertial SLAM With point-line fusion and parallel-line fusion. IEEE Robot Autom Lett, 2021.

    Google Scholar 

  27. Chen S (2021) Research on binocular vision inertial SLAM localization algorithm based on point-line features [D]. Huazhong University of Science and Technology

    Google Scholar 

  28. Engel J, Sch¨ops T, Cremers D. LSD-SLAM: Large-scale direct monocular SLAM. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6-12 September, 2014, pp 834–849.

    Google Scholar 

  29. Forster C, Pizzoli M, Scaramuzza D. SVO: fast semi-direct monocular visual odometry. Proceedings of IEEE International Conference on Robotics and Automation, May 31-June 7, 2014, Hong Kong, China, pp 15–22.

    Google Scholar 

  30. Sumikura S, Shibuya M, Sakurada K. OpenVSLAM: A versatile visual slam framework. Proceedings of the 27th ACM International Conference on Multimedia. pp: 2292-2295, Nice, France, 2019.

    Google Scholar 

  31. Tong Q, Peiliang L, Shaojie S (2018) VINS-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans Robot 34(4):1004–1020

    Article  Google Scholar 

  32. Shaozu C, Xiuyuan L, Shaojie S. GVINS: Tightly coupled GNSS-visual-inertial for smooth and consistent state estimation. arXiv e-prints, 2021: arXiv: 2103.07899.

    Google Scholar 

  33. Tan Z (2021) Research on path tracking and sensing technology for unmanned system of bulldozer. Dissertation. Huazhong University of Science and Technology

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Peng .

Appendices

Further Reading

  1. 1.

    ORB-SLAM3

    A typical visual SLAM system mainly includes data processing, initialization, vision odometer, map maintenance, closed-loop detection, and other parts. Compared with the previous two generations, ORB-SLAM3Footnote 1 mainly adds inertial sensors on the basis of monocular, stereo vision and RBG-D and introduces vision-inertia and multi-map mode (Atlas) and map merging. It is a full multi-map and multi-session system able to work in pure vision or visual-inertial modes with monocular, stereo, or RGB-D sensors, using pin-hole and fisheye camera models. As shown in Fig 10.31, there have four main modules including Tracking thread, Loop and map merging thread, Local mapping thread, and Atlas.

    Tracking thread processes sensor information and computes the pose of the current frame with respect to the active map in real-time, minimizing the re-projection error of the matched map features. It also decides whether the current frame becomes a key frame. In visual-inertial mode, the body velocity and IMU biases are estimated by optimization including the inertial residuals. When tracking is lost, the tracking thread tries to re-localize the current frame in all the Atlas’ maps. If re-localized, tracking is resumed, switching the active map if needed. Otherwise, after a certain time, the active map is stored as nonactive, and a new active map is initialized.

    Loop and map merging thread detects common regions between the active map and the whole Atlas at key frame rate. If the common area belongs to the active map, it performs loop correction; if it belongs to a different map, both maps are seamlessly merged into a single one, which becomes the active map. After a loop correction, a full BA is launched in an independent thread to further refine the map without affecting real-time performance.

    Local mapping thread adds key frames and points to the active map, removes the redundant ones, and refines the map using visual or visual-inertial bundle adjustment, operating in a local window of key frames close to the current frame. Additionally, in the inertial case, the IMU parameters are initialized and refined by the mapping thread using novel MAP-estimation technique.

    Atlas is a multi-map representation composed of a set of disconnected maps. There is an active map where the tracking thread localizes the incoming frames, and is continuously optimized and grown with new key frames by the local mapping thread. The other maps in the Atlas are the nonactive maps. The system builds a unique DBoW2 database of key frames that is used for re-localization, loop closing, and map merging.

  2. 2.

    LVI-SAM

    LVI-SAMFootnote 2 is built atop a factor graph and is composed of two subsystems: a visual-inertial system (VIS) and a lidar-inertial system (LIS). The two subsystems are designed in a tightly coupled manner, in which the VIS leverages LIS estimation to facilitate initialization. The accuracy of the VIS is improved by extracting depth information for visual features using LiDAR measurements. In turn, the LIS utilizes VIS estimation for initial guesses to support scan-matching. Loop closures are first identified by the VIS and further refined by the LIS. LVI-SAM can also be effective when one of the two subsystems fails, which increases robustness in both texture-less and feature-less environments.

    As shown in Fig 10.32, the VIS and LIS can function independently while using information from each other to increase system accuracy and robustness. The system outputs pose estimates at the IMU rate. The VIS processes images and IMU measurements, with LiDAR measurements being optional. Visual Odometry is obtained by minimizing the joint residuals of visual and IMU measurements. The LIS extracts LiDAR features and performs LiDAR Odometry by matching the extracted features with a feature map. The feature map is maintained in a sliding-window manner for real-time performance. Lastly, by jointly optimizing the contributions of IMU preintegration constraints, visual odometry constraints, LiDAR odometry constraints, and loop closure constraints in a factor graph using iSAM2, the state estimation problem is solved, which can be formulated as a maximum a posteriori (MAP) problem. Note that the multi-sensor graph optimization employed in the LIS is intended to reduce data exchange and improve system efficiency.

Fig. 10.31
A flow diagram includes tracking with frame and I M U, keyframe, local mapping, loop and map merging with loop correction and place recognition, and full B A. The Atlas contains a D B o W 2 keyframe database, active map, and non-active map.

Main system components of ORB-SLAM3

Fig. 10.32
A flow diagram includes a cyclic Lidar-aided visual-inertial odometry with feature extraction, and depth registration, and cyclic multi-sensor graph optimization with feature extraction and feature matching, with initial measurement from I M U, image from camera, and point cloud from Lidar.

System structure of LVI-SAM

Exercises

  1. 1.

    What are the optimization components used in the ORB-SLAM2 algorithm process?

  2. 2.

    Are there any other optimization methods commonly used for back end optimization? Describe these methods.

  3. 3.

    How are key frames selected?

  4. 4.

    Learn about the DBoW2 library and try to find a few images to test if loop closures can be detected correctly. What other methods can be used for loop closure detection except for the bag-of-words model?

  5. 5.

    How many representations of maps for visual SLAM are there? What application scenarios are they each suitable for?

  6. 6.

    What other methods are available for solving for camera motion besides epipolar geometry and PnP? Describe these methods.

  7. 7.

    In the visual SLAM algorithm, the pose change is calculated according to the matching feature points of adjacent frames. Assume that a point coordinate is (x, y, z), and rotate a, b, and c degrees according to the x-axis, y-axis, and z-axis, respectively. Please derive the corresponding rotation matrix R.

  8. 8.

    There are many representations of rotation, including matrix representation, axis–angle representation, and quaternion representation. It is very important to understand the conversion relationship between different representations. Please derive Rodriguez formula, which shows how to express the rotation of axis–angle as a rotation matrix.

  9. 9.

    When using direct linear transformation to solve PnP, if there are too many points, the coefficient matrix A in equation Ax = b will be overdetermined. Please prove that when A is overdetermined, the least square solution of Ax = b is x = (ATA)-1ATb.

  10. 10.

    Briefly describe the steps of monocular dense mapping and RGB-D dense mapping. Explain their differences and connections.

  11. 11.

    Briefly describe the application scenarios of no less than three common visual SLAM methods. Compare their advantages and disadvantages.

  12. 12.

    The figure below contains the main functions of a typical mobile robot system in operation: pose estimation, environmental mapping, navigation, and obstacle avoidance. Analyze the individual ROS nodes shown in the figure below.

A flow diagram includes a high-performance P C, pose estimation and space environment mapping, and map-based autonomous navigation and obstacle avoidance on the left, and an embedded controller with motion control, R O S serial client node, and peripheral sensor data acquisition on the right.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Peng, G., Lam, T.L., Hu, C., Yao, Y., Liu, J., Yang, F. (2023). Visual SLAM for Mobile Robot. In: Introduction to Intelligent Robot System Design. Springer, Singapore. https://doi.org/10.1007/978-981-99-1814-0_10

Download citation

Publish with us

Policies and ethics