Guided Feature Selection for Deep Visual Odometry

  • Fei XueEmail author
  • Qiuyuan Wang
  • Xin Wang
  • Wei Dong
  • Junqiu Wang
  • Hongbin ZhaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11366)


We present a novel end-to-end visual odometry architecture with guided feature selection based on deep convolutional recurrent neural networks. Different from current monocular visual odometry methods, our approach is established on the intuition that features contribute discriminately to different motion patterns. Specifically, we propose a dual-branch recurrent network to learn the rotation and translation separately by leveraging current Convolutional Neural Network (CNN) for feature representation and Recurrent Neural Network (RNN) for image sequence reasoning. To enhance the ability of feature selection, we further introduce an effective context-aware guidance mechanism to force each branch to distill related information for specific motion pattern explicitly. Experiments demonstrate that on the prevalent KITTI and ICL_NUIM benchmarks, our method outperforms current state-of-the-art model- and learning-based methods for both decoupled and joint camera pose recovery.


Visual odometry Recurrent neural networks Feature selection 


  1. 1.
    Bazin, J.C., Demonceaux, C., Vasseur, P., Kweon, I.: Motion estimation by decoupling rotation and translation in catadioptric vision. CVIU 114, 254–273 (2010)Google Scholar
  2. 2.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40, 834–848 (2018)Google Scholar
  3. 3.
    Choi, J., et al.: Context-aware deep feature compression for high-speed visual tracking. In: CVPR (2018)Google Scholar
  4. 4.
    Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI (2017)Google Scholar
  5. 5.
    Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. TOG 36, 76a (2017)Google Scholar
  6. 6.
    Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: ICCV (2015)Google Scholar
  7. 7.
    Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. TPAMI 1, 4 (2017)Google Scholar
  8. 8.
    Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). Scholar
  9. 9.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)Google Scholar
  10. 10.
    Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: dense 3D reconstruction in real-time. In: IV (2011)Google Scholar
  11. 11.
    Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: ICRA (2014)Google Scholar
  12. 12.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)Google Scholar
  14. 14.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)Google Scholar
  15. 15.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)Google Scholar
  16. 16.
    Jo, Y., Jang, J., Paik, J.: Camera orientation estimation using motion based vanishing point detection for automatic driving assistance system. In: ICCE (2018)Google Scholar
  17. 17.
    Kaess, M., Ni, K., Dellaert, F.: Flow separation for fast and robust stereo odometry. In: ICRA (2009)Google Scholar
  18. 18.
    Kerl, C., Sturm, J., Cremers, D.: Robust odometry estimation for RGB-D cameras. In: ICRA (2013)Google Scholar
  19. 19.
    Kim, P., Coltin, B., Kim, H.J.: Visual odometry with drift-free rotation estimation using indoor scene regularities. In: BMVC (2017)Google Scholar
  20. 20.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  21. 21.
    Lee, J.K., Yoon, K.J., et al.: Real-time joint estimation of camera orientation and vanishing points. In: CVPR (2015)Google Scholar
  22. 22.
    Li, R., Wang, S., Long, Z., Gu, D.: UnDeepVO: monocular visual odometry through unsupervised deep learning. In: ICRA (2018)Google Scholar
  23. 23.
    Liu, N., Han, J.: PiCANet: learning pixel-wise contextual attention in ConvNets and its application in saliency detection. In: CVPR (2018)Google Scholar
  24. 24.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)Google Scholar
  25. 25.
    Mac Aodha, O., Perona, P., et al.: Context embedding networks. In: CVPR (2018)Google Scholar
  26. 26.
    Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. T-RO 33, 1255–1262 (2017)Google Scholar
  27. 27.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)Google Scholar
  28. 28.
    Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV (2011)Google Scholar
  29. 29.
    Paszke, A., Gross, S., Chintala, S., Chanan, G.: Pytorch (2017).
  30. 30.
    Paz, L.M., Piniés, P., Tardós, J.D., Neira, J.: Large-scale 6-DOF SLAM with stereo-in-hand. T-RO 24, 946–957 (2008)Google Scholar
  31. 31.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)Google Scholar
  32. 32.
    Straub, J., Bhandari, N., Leonard, J.J., Fisher, J.W.: Real-time Manhattan world rotation estimation in 3D. In: IROS (2015)Google Scholar
  33. 33.
    Tardif, J.P., Pavlidis, Y., Daniilidis, K.: Monocular visual odometry in urban environments using an omnidirectional camera. In: IROS (2008)Google Scholar
  34. 34.
    Ummenhofer, B., et al.: DeMoN: depth and motion network for learning monocular stereo. In: CVPR (2017)Google Scholar
  35. 35.
    Wang, S., Clark, R., Wen, H., Trigoni, N.: DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: ICRA (2017)Google Scholar
  36. 36.
    Wang, S., Clark, R., Wen, H., Trigoni, N.: End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. IJRR 37, 513–542 (2017)Google Scholar
  37. 37.
    Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NIPS (2015)Google Scholar
  38. 38.
    Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR (2018)Google Scholar
  39. 39.
    Zamir, A.R., Sax, A., Shen, W., Guibas, L., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR (2018)Google Scholar
  40. 40.
    Zhan, H., Garg, R., Saroj Weerasekera, C., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR (2018)Google Scholar
  41. 41.
    Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR (2018)Google Scholar
  42. 42.
    Zhang, J., Kaess, M., Singh, S.: Real-time depth enhanced monocular odometry. In: IROS (2014)Google Scholar
  43. 43.
    Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)Google Scholar
  44. 44.
    Zhou, Y., Kneip, L., Rodriguez, C., Li, H.: Divide and conquer: efficient density-based tracking of 3D sensors in Manhattan worlds. In: ACCV (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Key Laboratory of Machine Perception (MOE), School of EECSPeking UniversityBeijingChina
  2. 2.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA
  3. 3.Beijing Changcheng Aviation Measurement and Control InstituteBeijingChina
  4. 4.Cooperative Medianet Innovation CenterShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations