Skip to main content

Learning the Frame-2-Frame Ego-Motion for Visual Odometry with Convolutional Neural Network

Part of the Communications in Computer and Information Science book series (CCIS,volume 773)

Abstract

Visual odometry (VO) is one of the important components of visual SLAM systems, and some impressive works about VO have been presented recently. However, these methods mostly follow the traditional feature detection and tracking pipeline, which usually suffer from less robustness to complex scenarios. Deep learning has presented outstanding performance in various visual tasks, which has great potential to improve VO. In this paper, we discuss how to learn an appropriate estimator to predict the frame-2-frame ego-motion with convolutional neural network. Specifically, we construct a CNN model which formulates the pose regression as a supervised learning problem. Here the proposed architecture uses raw images and optical flow as input to predict the motion. As a result, the trajectories can be produced by iterative computation. We experimentally demonstrate the performance of the proposed method on public dataset, which can achieve better ego-motion estimation compared to the baselines.

Keywords

  • Visual odometry
  • Ego motion
  • CNNs

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-981-10-7305-2_43
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-981-10-7305-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

References

  1. Dosovitskiy, A., Fischer, P., llg, E.: Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)

    Google Scholar 

  2. DeTone, D., Malisiewicz, T., Rabinovich, A.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)

  3. Konda, K.R., Memisevic, R.: Learning visual odometry with a convolutional network. VISAPP 1, 486–490 (2015)

    Google Scholar 

  4. Kendall, A., Grimes, M., Clipolla, R.: Posenet: a convolutional network for real-time 6-DOF camera relocalization. Proceedings of the IEEE international conference on computer vision, pp. 2938–2946 (2015)

    Google Scholar 

  5. Roberts, R., Nguyen, H., Krishnamurthi, N., Balch, T.: Memory-based learning for visual odometry. In: IEEE International Conference on Robotics and Automation, pp. 47–52 (2008)

    Google Scholar 

  6. Guizilini, V., Ramos, F.: Semi-parametric learning for visual odometry. Int. J. Robot. Res. 32(5), 526–546 (2013)

    CrossRef  Google Scholar 

  7. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24673-2_3

    CrossRef  Google Scholar 

  8. Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Leonard, J.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016)

    CrossRef  Google Scholar 

  9. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    CrossRef  Google Scholar 

  10. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: The IEEE International Symposium on Mixed and Augmented Reality, pp. 225–234, November 2007

    Google Scholar 

  11. Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: IEEE International Conference on Robotics and Automation, pp. 15–22, May 2014

    Google Scholar 

  12. Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: International Conference on Computer Vision, pp. 2320–2327, November 2011

    Google Scholar 

  13. Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_54

    Google Scholar 

  14. Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visualCinertial odometry using nonlinear optimization. Int. J. Robot. Res. 34(3), 314–334 (2015)

    CrossRef  Google Scholar 

  15. Costante, G., Mancini, M., Valigi, P., Ciarfuglia, T.A.: Exploring representation learning with CNNs for frame-to-frame ego-motion estimation. IEEE Robot. Autom. Lett. 1(1), 18–25 (2016)

    CrossRef  Google Scholar 

  16. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)

    Google Scholar 

  17. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Conference on Neural Information Processing Systems, pp. 568–576 (2014)

    Google Scholar 

  18. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  19. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361, June 2012

    Google Scholar 

  20. Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: dense 3D reconstruction in real-time. In: Intelligent Vehicles Symposium (IV), pp. 963–968, June 2011

    Google Scholar 

  21. Horn, B.K.: Closed-form solution of absolute orientation using unit quaternions. JOSA A 4(4), 629–642 (1987)

    CrossRef  Google Scholar 

  22. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference on Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  23. Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3150–3158 (2016)

    Google Scholar 

  24. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    CrossRef  Google Scholar 

  25. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)

    CrossRef  Google Scholar 

  26. Hong, C., Yu, J., Wan, J., Tao, D., Wang, M.: Multimodal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 24(12), 5659–5670 (2015)

    MathSciNet  CrossRef  Google Scholar 

  27. Hong, C., Yu, J., Tao, D., Wang, M.: Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans. Industr. Electron. 62(6), 3742–3751 (2015)

    Google Scholar 

  28. Guizilini, V., Ramos, F.: Semi-parametric models for visual odometry. In: IEEE International Conference on Robotics and Automation, pp. 3482–3489, May 2012

    Google Scholar 

Download references

Acknowledgements

This work is supported partially by the National Natural Science Foundation of China under Grant 61673362 and 61233003, Youth Innovation Promotion Association CAS, and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zilei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Qiao, M., Wang, Z. (2017). Learning the Frame-2-Frame Ego-Motion for Visual Odometry with Convolutional Neural Network. In: , et al. Computer Vision. CCCV 2017. Communications in Computer and Information Science, vol 773. Springer, Singapore. https://doi.org/10.1007/978-981-10-7305-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7305-2_43

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7304-5

  • Online ISBN: 978-981-10-7305-2

  • eBook Packages: Computer ScienceComputer Science (R0)