Skip to main content
Log in

Accurate and robust odometry by fusing monocular visual, inertial, and wheel encoder

  • Regular Paper
  • Published:
CCF Transactions on Pervasive Computing and Interaction Aims and scope Submit manuscript


Tracking the pose of a robot has been gaining importance in the field of Robotics, e.g., paving the way for robot navigation. In recent years, monocular visual–inertial odometry (VIO) is widely used to do the pose estimation due to its good performance and low cost. However, VIO cannot estimate the scale or orientation accurately when robots move along straight lines or circular arcs on the ground. To address the problem, in this paper we take the wheel encoder into account, which can provide us with stable translation information as well as small accumulated errors and momentary slippage errors. By jointly considering the kinematic constraints and the planar moving features, an odometry algorithm tightly coupled with monocular camera, IMU, and wheel encoder is proposed to get robust and accurate pose sensing for mobile robots, which mainly contains three steps. First, we present the wheel encoder preintegration theory and noise propagation formula based on the kinematic mobile robot model, which is the basis of accurate estimation in backend optimization. Second, we adopt a robust initialization method to obtain good initial values of gyroscope bias and visual scale in reality, by making full use of the camera, IMU and wheel encoder measurements. Third, we bound the high computation complexity with a marginalization strategy that conditionally eliminates unnecessary measurements in the sliding window. We implement a prototype and several extensive experiments showing that our system can achieve robust and accurate pose estimation, in terms of the scale, orientation and location, compared with the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others


  • Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)

    Article  Google Scholar 

  • Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  • Golub, G., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14, 403–420 (1970)

    Article  MathSciNet  Google Scholar 

  • Grisetti, G., Stachniss, C., Burgard, W.: Improving grid-based SLAM with Rao-Blackwellized particle filters by adaptive proposals and selective resampling. In: Proceedings of IEEE ICRA, pp. 2432–2437 (2005)

  • Grisetti, G., Stachniss, C., Burgard, W.: Improved techniques for grid mapping with Rao-Blackwellized particle filters. IEEE Trans. Robot. 23(1), 34–46 (2007)

    Article  Google Scholar 

  • Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964)

    Article  MathSciNet  Google Scholar 

  • Jianbo, S., Tomasi: Good features to track. In: Proceedings of IEEE CVPR, pp. 593–600 (1994)

  • Kaiser, J., Martinelli, A., Fontana, F., Scaramuzza, D.: Simultaneous state initialization and gyroscope bias calibration in visual inertial aided navigation. IEEE Robot. Autom. Lett. 2(1), 18–25 (2017)

    Article  Google Scholar 

  • Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155–166 (2009)

    Article  Google Scholar 

  • Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: Binary robust invariant scalable keypoints. In: Proceedings of IEEE ICCV, pp. 2548–2555 (2011)

  • Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res. 34(3), 314–334 (2014)

    Article  Google Scholar 

  • Li, D., Eckenhoff, K., Wang, Y., Xiong, R., Huang, G.: Gyro-aided camera-odometer online calibration and localization. In: Proceedings of IEEE ACC, pp. 3579–3586 (2017)

  • Liu, J., Gao, W., Hu, Z.: Visual–inertial odometry tightly coupled with wheel encoder adopting robust initialization and online extrinsic calibration. In: Proceedings of IEEE/RSJ IROS, pp. 5391–5397 (2019)

  • Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of IJCAI, pp. 674–679 (1997)

  • Martinelli, A.: Closed-form solution of visual–inertial structure from motion. Int. J. Comput. Vis. 106, 138–152 (2014)

    Article  MathSciNet  Google Scholar 

  • Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004)

    Article  Google Scholar 

  • Mourikis, A.I., Roumeliotis, S.I.: A multi-state constraint Kalman filter for vision-aided inertial navigation. In: Proceedings of IEEE ICRA, pp. 3565–3572 (2007)

  • Mur-Artal, R., Tardós, J.D.: Visual–inertial monocular SLAM with map reuse. IEEE Robot. Autom. Lett. 2(2), 796–803 (2016)

    Article  Google Scholar 

  • Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  • Nister, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)

    Article  Google Scholar 

  • Qin, T., Li, P., Shen, S.: VINS-Mono: a robust and versatile monocular visual–inertial state estimator. IEEE Trans. Robot. 34(4), 1004–1020 (2018)

    Article  Google Scholar 

  • Qin, T., Pan, J., Cao, S., Shen, S.: A general optimization-based framework for local odometry estimation with multiple sensors (2019). arXiv:1901.03638

  • Shen, S., Mulgaonkar, Y., Michael, N., Kumar, V.: Initialization-Free Monocular Visual–Inertial State Estimation with Application to Autonomous MAVs, pp. 211–227 (2016)

  • Sibley, G., Matthies, L., Sukhatme, G.: Sliding window filter with application to planetary landing. J. Field Robot. 27(5), 587–608 (2010)

    Article  Google Scholar 

  • Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: Proceedings of IEEE/RSJ IROS, pp. 573–580 (2012)

  • Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Proceedings of IWVA, pp. 298–372 (2000)

  • Wu, C.: Towards linear-time incremental structure from motion. In: Proceedings of IEEE 3DV, pp. 127–134 (2013)

  • Wu, K.J., Chao, G.X., Georgiou, G., Roumeliotis, S.I.: Vins on wheels. In: Proceedings of IEEE ICRA, pp. 5155–5162 (2017)

  • Yang, Z., Shen, S.: Monocular visual–inertial state estimation with online initialization and camera–IMU extrinsic calibration. IEEE Trans. Autom. Sci. Eng. 14(1), 39–51 (2017)

    Article  Google Scholar 

  • Yang, D., Bi, S., Wang, W., Yuan, C., Qi, X., Cai, Y.: DRE-SLAM: dynamic RGB-D encoder SLAM for a differential-drive robot. Remote Sens. 11(4), 380 (2019)

    Article  Google Scholar 

Download references


This work is supported by the National Natural Science Foundation of China (nos. 61702257 and 61771236), Natural Science Foundation of Jiangsu Province (no. BK20170648), Fundamental Research Funds for the Central Universities (14380066), and Collaborative Innovation Center of Novel Software Technology and Industrialization. Jia Liu and Lijun Chen are the corresponding authors.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Jia Liu or Lijun Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niu, Y., Liu, J., Wang, X. et al. Accurate and robust odometry by fusing monocular visual, inertial, and wheel encoder. CCF Trans. Pervasive Comp. Interact. 2, 275–287 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: