Abstract
Object-level landmarks enable the SLAM system to construct robust object-keyframe constraints of bundle adjustment and improve the pose estimation performance. In this paper, we present a real-time online object-level SLAM. The dual Bundle Adjustment (BA) optimization method, including high and low frequencies, is proposed to optimize the estimated pose. The High-frequency BA (HBA) module is used to quickly estimate the camera pose by matching landmarks of keyframes and feature points of the current frame. Then, the estimated camera pose is used in the Low-frequency BA (LBA) module to improve the trajectory accuracy. The LBA module integrates the object-level landmarks into the pose graph to optimize the camera pose of local mapping. Moreover, we build an additional object detection thread to extract object 2D bounding boxes online. While this paper improves the data association through the depth projection of point-line features and the Euclidean distance of object centroid. Experimental results show that our proposed algorithm effectively reduce the drift error of camera pose estimation and improve the accuracy by a large margin on different datasets.
Similar content being viewed by others
Notes
Our code is made publicly available at: https://github.com/Jake755/object_slam.git
References
Mur-Artal R, Tardós JD (2017) Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Robot 33(5):1255–1262
Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Anal Mach Intell 40(3):611–625
Pumarola A, Vakhitov A, Agudo A, Sanfeliu A, Moreno-Noguer F (2017) Pl-slam: Real-time monocular visual slam with points and lines. In: 2017 IEEE Int Conf Robot Autom (ICRA), pp. 4503–4508 IEEE
Yang S, Scherer S (2019) Monocular object and plane slam in structured environments. IEEE Robot Autom Letters 4(4):3145–3152
Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ (2013) Slam++: Simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE Conf Comput Vision Pattern Recog, pp. 1352–1359
Runz M, Buffier M, Agapito L (2018) Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In: 2018 IEEE Int Symp Mixed and Augmented Real (ISMAR), pp. 1020 IEEE
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE Int Conf Comp Vision, pp. 2961–2969
Qin T, Chen T, Chen Y, Su Q (2020) Avp-slam: Semantic visual mapping and localization for autonomous vehicles in the parking lot. In: 2020 IEEE/RSJ Int Conf Int Robots Sys (IROS), pp. 5939–5945 IEEE
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Int Conf Medical Image Comput Computer-assisted Interv, pp. 234–241 Springer
Nicholson L, Milford M, Sünderhauf N (2018) Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robotics and Automation Letters 4(1):1–8
Tian R, Zhang Y, Feng Y, Yang L, Cao Z, Coleman S, Kerr D (2021) Accurate and robust object slam with 3d quadric landmark reconstruction in outdoors. IEEE Robotics and Automation Letters 7(2):1534–1541
Yang S, Scherer S (2019) Cubeslam: Monocular 3-d object slam. IEEE Transactions on Robotics 35(4):925–938
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conf Comput Vision and Pattern Recog, pp. 779–788
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conf Comput Vision, pp. 1440–1448
Bolya D, Zhou C, Xiao F, Lee YJ (2019) Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF Int Conf Comput Vision, pp. 9157–9166
Klein G, Murray D (2007) Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 225–234 IEEE
Mur-Artal R, Tardós JD (2017) Visual-inertial monocular slam with map reuse. IEEE Robotics and Automation Letters 2(2):796–803
Bescos B, Fácil JM, Civera J, Neira J (2018) Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters 3(4):4076–4083
Gálvez-López D, Tardos JD (2012) Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics 28(5):1188–1197
Bescos B, Campos C, Tardós JD, Neira J (2021) Dynaslam ii: Tightly-coupled multi-object tracking and slam. IEEE robotics and automation letters 6(3):5191–5198
Huang J, Yang S, Mu T-J, Hu SM (2020) Clustervo: Clustering moving instances and estimating visual odometry for self and surroundings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2168–2177
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571 IEEE
Von Gioi RG, Jakubowicz J, Morel J-M, Randall G (2012) Lsd: A linesegment detector. Image Processing On Line 2:35–55
Akinlar C, Topal C (2011) Edlines: A real-time line segment detector with a false detection control. Pattern Recognition Letters 32(13):1633–1642
Fernandes LA, Oliveira MM (2008) Real-time line detection through an improved hough transform voting scheme. Pattern recognition 41(1):299–314
Lu X, Yao J, Li K, Li L (2015) Cannylines: A parameter-free line segment detector. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 507–511 IEEE
Desolneux A, Moisan L, Morel J-M (2008) The helmholtz principle. A Probabilistic Approach, From Gestalt Theory to Image Analysis, pp 31–45
Andrew AM (2001) Multiple view geometry in computer vision. Kybernetes
Rosten E (2006) Machine learning for very high-speed corner detection. In: Proceedings of the ECCV, vol. 6
Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: Binary robust independent elementary features. In: European Conference on Computer Vision, Springer pp. 778–792
Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation of rgb-d slam systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573–580 IEEE
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conf Comput Vision and Pattern Recog, pp. 3354–3361 IEEE
Xue F, Zhuo G, Huang Z, Fu W, Wu Z, Ang MH (2020) Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In: 2020 IEEE/RSJ Int Conf Intell Robots Syst (IROS), pp. 2330–2337 IEEE
Tian R, Zhang Y, Zhu D, Liang S, Coleman S, Kerr D (2021) Accurate and robust scale recovery for monocular visual odometry based on plane geometry. In: 2021 IEEE Int Conf Robot Autom (ICRA), pp. 5296–5302 IEEE
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant U2033218, 61831018, 61802253, in part by Shanghai Local Capacity Enhancement project (No. 21010501500), in part by “Science and Technology Innovation Action Plan” of Shanghai Science and Technology Commission for social development project under Grant 21DZ1204900.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The corresponding author of this paper is the associate editor of Applied Intelligence.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, J., Gao, Y., Jiang, X. et al. Online object-level SLAM with dual bundle adjustment. Appl Intell 53, 25092–25105 (2023). https://doi.org/10.1007/s10489-023-04854-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04854-4