Graph attention network-optimized dynamic monocular visual odometry

Hongru, Zhao; Xiuquan, Qiao

doi:10.1007/s10489-023-04687-1

Graph attention network-optimized dynamic monocular visual odometry

Published: 05 July 2023

Volume 53, pages 23067–23082, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhao Hongru¹ &
Qiao Xiuquan¹

245 Accesses
Explore all metrics

Abstract

Monocular Visual Odometry (VO) is often formulated as a sequential dynamics problem that relies on scene rigidity assumption. One of the main challenges is rejecting moving objects and estimating camera pose in dynamic environments. Existing methods either take the visual cues in the whole image equally or eliminate the fixed semantic categories by heuristics or attention mechanisms. However, they fail to tackle unknown dynamic objects which are not labeled in the training sets of the network. To solve these issues, we propose a novel framework, named graph attention network (GAT)-optimized dynamic monocular visual odometry (GDM-VO), to remove dynamic objects explicitly with semantic segmentation and multi-view geometry in this paper. Firstly, we employ a multi-task learning network to perform semantic segmentation and depth estimation. Then, we reject priori known and unknown objective moving objects through semantic information and multi-view geometry, respectively. Furthermore, to our best knowledge, we are the first to leverage GAT to capture long-range temporal dependencies from consecutive image sequences adaptively, while existing sequential modeling approaches need to select information manually. Extensive experiments on the KITTI and TUM datasets demonstrate the superior performance of GDM-VO overs existing state-of-the-art classical and learning-based monocular VO.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Transformer guided geometry model for flow-based unsupervised visual odometry

Article 02 January 2021

Guided Feature Selection for Deep Visual Odometry

Eliminating Scale Ambiguity of Unsupervised Monocular Visual Odometry

Article 13 March 2023

References

Qiao, X., Ren, P., Dustdar, S., Liu, L., Ma, H., Chen, J.: Web ar: A promising future for mobile augmented reality-state of the art, challenges, and insights. Proceedings of the IEEE 107(4), 651–666 (2019)
Article Google Scholar
Yadav, R., Kala, R.: Fusion of visual odometry and place recognition for slam in extreme conditions. Applied Intelligence, 1–20 (2022)
Liu, H., Fang, S., Zhang, Z., Li, D., Lin, K., Wang, J.: Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Transactions on Multimedia 24, 2449–2460 (2021)
Article Google Scholar
Liu, H., Liu, T., Zhang, Z., Sangaiah, A.K., Yang, B., Li, Y.: Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction. IEEE Transactions on Industrial Informatics 18(10), 7107–7117 (2022)
Article Google Scholar
Liu, H., Zheng, C., Li, D., Shen, X., Lin, K., Wang, J., Zhang, Z., Zhang, Z., Xiong, N.N.: Edmf: Efficient deep matrix factorization with review feature learning for industrial recommender system. IEEE Transactions on Industrial Informatics 18(7), 4361–4371 (2021)
Article Google Scholar
Liu, H., Liu, T., Chen, Y., Zhang, Z., Li, Y.-F.: Ehpe: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Transactions on Multimedia (2022)
Wang, S., Clark, R., Wen, H., Trigoni, N.: Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2043–2050 (2017). IEEE
Wang, S., Clark, R., Wen, H., Trigoni, N.: End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. The International Journal of Robotics Research 37(4–5), 513–542 (2018)
Article Google Scholar
Sun, T., Sun, Y., Liu, M., Yeung, D.-Y.: Movable-object-aware visual slam via weakly supervised semantic segmentation. arXiv preprint arXiv:1906.03629 (2019)
Kuo, X.-Y., Liu, C., Lin, K.-C., Lee, C.-Y.: Dynamic attention-based visual odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 36–37 (2020)
Damirchi, H., Khorrambakht, R., Taghirad, H.D.: Exploring self-attention for visual odometry. arXiv preprint arXiv:2011.08634 (2020)
Bescos, B., Fácil, J.M., Civera, J., Neira, J.: Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters 3(4), 4076–4083 (2018)
Article Google Scholar
Cui, L., Ma, C.: Sof-slam: A semantic visual slam for dynamic environments. IEEE Access 7, 166528–166539 (2019)
Article Google Scholar
Wang, K., Lin, Y., Wang, L., Han, L., Hua, M., Wang, X., Lian, S., Huang, B.: A unified framework for mutual improvement of slam and semantic segmentation. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5224–5230 (2019). IEEE
Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
Xue, F., Wang, X., Li, S., Wang, Q., Wang, J., Zha, H.: Beyond tracking: Selecting memory and refining poses for deep visual odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8575–8583 (2019)
Li, S., Xue, F., Wang, X., Yan, Z., Zha, H.: Sequential adversarial learning for self-supervised deep visual odometry. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2851–2860 (2019)
Zou, Y., Ji, P., Tran, Q.-H., Huang, J.-B., Chandraker, M.: Learning monocular visual odometry via self-supervised long-term modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pp. 710–727 (2020). Springer
Xue, F., Wang, Q., Wang, X., Dong, W., Wang, J., Zha, H.: Guided feature selection for deep visual odometry. In: Asian Conference on Computer Vision, pp. 293–308 (2018). Springer
Saputra, M.R.U., de Gusmao, P.P., Wang, S., Markham, A., Trigoni, N.: Learning monocular visual odometry through geometry-aware curriculum learning. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3549–3555 (2019). IEEE
Sun, Y., Liu, M., Meng, M.Q.-H.: Improving rgb-d slam in dynamic environments: A motion removal approach. Robotics and Autonomous Systems 89, 110–122 (2017)
Article Google Scholar
Dai, W., Zhang, Y., Li, P., Fang, Z., Scherer, S.: Rgb-d slam in dynamic environments using point correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., Fei, Q.: Ds-slam: A semantic visual slam towards dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168–1174 (2018). IEEE
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39(12), 2481–2495 (2017)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Ji, T., Wang, C., Xie, L.: Towards real-time semantic rgb-d slam in dynamic environments. arXiv preprint arXiv:2104.01316 (2021)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE transactions on neural networks 20(1), 61–80 (2008)
Article Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Xue, F., Wu, X., Cai, S., Wang, J.: Learning multi-view camera relocalization with graph neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11372–11381 (2020). IEEE
Turkoglu, M.O., Brachmann, E., Schindler, K., Brostow, G., Monszpart, A.: Visual camera re-localization using graph neural networks and relative pose supervision. arXiv preprint arXiv:2104.02538 (2021)
Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 53–69 (2018)
Gao, T., Wei, W., Cai, Z., Fan, Z., Xie, S., Wang, X., Yu, Q.: Ci-net: Contextual information for joint semantic segmentation and depth estimation. arXiv preprint arXiv:2107.13800 (2021)
Nekrasov, V., Dharmasiri, T., Spek, A., Drummond, T., Shen, C., Reid, I.: Real-time joint semantic segmentation and depth estimation using asymmetric annotations. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7101–7107 (2019). IEEE
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Nekrasov, V., Shen, C., Reid, I.: Light-weight refinenet for real-time semantic segmentation. arXiv preprint arXiv:1810.03272 (2018)
Lin, G., Liu, F., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks for dense prediction. IEEE transactions on pattern analysis and machine intelligence 42(5), 1228–1242 (2019)
Google Scholar
Gerlach, N.L., Meijer, G.J., Kroon, D.-J., Bronkhorst, E.M., Bergé, S.J., Maal, T.J.J.: Evaluation of the potential of automatic segmentation of the mandibular canal using cone-beam computed tomography. British journal of oral and maxillofacial surgery 52(9), 838–844 (2014)
Article Google Scholar
Sun, D., Yang, X., Liu, M.-Y., Kautz, J.: Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)
Rong, Y., Huang, W., Xu, T., Huang, J.: Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903 (2019)
Wan, Y., Gao, W., Wu, Y.: Optical flow assisted monocular visual odometry. In: Asian Conference on Pattern Recognition, pp. 366–377 (2019). Springer
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Transactions on Robotics (2021)
Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: Dense 3d reconstruction in real-time. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 963–968 (2011). Ieee
Lee, S., Rameau, F., Im, S., Kweon, I.S.: Self-supervised monocular depth and motion learning in dynamic scenes: Semantic prior to rescue. International Journal of Computer Vision 130(9), 2265–2285 (2022)
Article Google Scholar
Kazerouni, A., Heydarian, A., Soltany, M., Mohammadshahi, A., Omidi, A., Ebadollahi, S.: An intelligent modular real-time vision-based system for environment perception
Zhu, Y., Sapra, K., Reda, F.A., Shih, K.J., Newsam, S., Tao, A., Catanzaro, B.: Improving semantic segmentation via video propagation and label relaxation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8856–8865 (2019)
Kreso, I., Segvic, S., Krapac, J.: Ladder-style densenets for semantic segmentation of large natural images. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 238–245 (2017)
Erkent, Ö., Laugier, C.: Semantic segmentation with unsupervised domain adaptation under varying weather conditions for autonomous vehicles. IEEE Robotics and Automation Letters 5(2), 3580–3587 (2020)
Article Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Cao, Y., Wu, Z., Shen, C.: Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28(11), 3174–3182 (2017)
Article Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Costante, G., Mancini, M., Valigi, P., Ciarfuglia, T.A.: Exploring representation learning with cnns for frame-to-frame ego-motion estimation. IEEE robotics and automation letters 1(1), 18–25 (2015)
Article Google Scholar
Zhong, F., Wang, S., Zhang, Z., Wang, Y.: Detect-slam: Making object detection and slam mutually beneficial. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1001–1010 (2018). IEEE

Download references

Acknowledgements

This work was supported in part by the National Key R &D Program of China under Grant the 2018YFE0205503; in part by the Funds for International Cooperation and Exchange of NSFC under Grant 61720106007.

Author information

Authors and Affiliations

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, 100876, Beijing, China
Zhao Hongru & Qiao Xiuquan

Authors

Zhao Hongru
View author publications
You can also search for this author in PubMed Google Scholar
Qiao Xiuquan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiao Xiuquan.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hongru, Z., Xiuquan, Q. Graph attention network-optimized dynamic monocular visual odometry. Appl Intell 53, 23067–23082 (2023). https://doi.org/10.1007/s10489-023-04687-1

Download citation

Accepted: 04 May 2023
Published: 05 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04687-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph attention network-optimized dynamic monocular visual odometry

Abstract

Access this article

Similar content being viewed by others

Transformer guided geometry model for flow-based unsupervised visual odometry

Guided Feature Selection for Deep Visual Odometry

Eliminating Scale Ambiguity of Unsupervised Monocular Visual Odometry

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Graph attention network-optimized dynamic monocular visual odometry

Abstract

Access this article

Similar content being viewed by others

Transformer guided geometry model for flow-based unsupervised visual odometry

Guided Feature Selection for Deep Visual Odometry

Eliminating Scale Ambiguity of Unsupervised Monocular Visual Odometry

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation