Skip to main content
Log in

Realtime Object-aware Monocular Depth Estimation in Onboard Systems

  • Regular Papers
  • Robot and Applications
  • Published:
International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Abstract

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Similar content being viewed by others

References

  1. J. Zhu and Y. Fang, “Learning object-specific distance from a monocular image,” Proceedings of the IEEE International Conference on Computer Vision, 2019.

  2. A. J. Afifi and O. Hellwich, “Object depth estimation from a single image using fully convolutional neural network,” Proc. of International Conference on Digital Image Computing: Techniques and Applications, pp. 1–7, 2016.

  3. C. Lee, H. Seo, and H. J. Kim, “Position-based monocular visual servoing of an unknown target using online self-supervised learning,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4467–4473, 2019.

  4. H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

  5. S. J. Lee, H. Choi, and S. S. Hwang, “Real-time depth estimation using recurrent cnn with sparse depth cues for slam system,” International Journal of Control, Automation and Systems, vol. 18, no. 1, pp. 206–216, 2020.

    Article  Google Scholar 

  6. J. Watson, M. Firman, G. J. Brostow, and D. Turmukhambetov, “Self-supervised monocular depth hints,” Proceedings of the IEEE International Conference on Computer Vision, pp. 2162–2171, 2019.

  7. R. Wang, S. M. Pizer, and J.-M. Frahm, “Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5555–5564, 2019.

  8. T. Feng and D. Gu, “Sganvo: Unsupervised deep visual odometry and depth estimation with stacked generative adversarial networks,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 4431–4437, 2019.

    Article  Google Scholar 

  9. J.-N. Zhang, Q.-X. Su, P.-Y. Liu, H.-Y. Ge, and Z.-F. Zhang, “Mudeepnet: Unsupervised learning of dense depth, optical flow and camera pose using multi-view consistency loss,” International Journal of Control, Automation and Systems, vol. 17, no. 10, pp. 2586–2596, 2019.

    Article  Google Scholar 

  10. C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3838, 2019.

  11. A. Atapour-Abarghouei and T. P. Breckon, “Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2810, 2018.

  12. R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” International Conference on Computer Vision, 2011.

  13. K. Wang, W. Ding, and S. Shen, “Quadtree-accelerated real-time monocular dense mapping,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1–9, 2018.

  14. M. Pizzoli, C. Forster, and D. Scaramuzza, “Remode: Probabilistic, monocular dense reconstruction in real time,” Proc. of IEEE International Conference on Robotics and Automation, pp. 2609–2616, 2014.

  15. C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” Proc. of IEEE International Conference on Robotics and Automation, 2014.

  16. R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and RGB-D cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.

    Article  Google Scholar 

  17. P. Kim, H. Lim, and H. J. Kim, “Visual inertial odometry with pentafocal geometric constraints,” International Journal of Control, Automation and Systems, vol. 16, no. 4, pp. 1962–1970, 2018.

    Article  Google Scholar 

  18. J. Shi and C. Tomas, “Good features to track,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600, 1994.

  19. G. Vogiatzis and C. Hernández, “Video-based, real-time multi-view stereo,” Image and Vision Computing, vol. 29, no. 7, pp. 434–441, 2011.

    Article  Google Scholar 

  20. J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

  21. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” International Journal of Robotics Research, vol. 32, no. 11, pp.1231–1237, 2013.

    Article  Google Scholar 

  22. D. B. Fambro, R. J. Koppa, D. L. Picha, and K. Fitzpatrick, “Driver braking performance in stopping sight distance situations,” Transportation Research Record, vol. 1701, no. 1, pp. 9–16, 2000.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Jin Kim.

Additional information

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the Hyundai MnSoft, Inc. We appreciate for the permission to use the data for this article. This research was supported by Unmanned Vehicles Core Technology Research and Development Program through the National Research Foundation of Korea(NRF) and Unmanned Vehicle Advanced Research Center(UVARC) funded by the Ministry of Science and ICT, the Republic of Korea(NRF-2020M3C1C1A01086411).

Sangil Lee received his B.S. degree in electronic engineering and an M.S. degree in mechanical and aerospace engineering from Seoul National University, in 2015 and 2017, respectively. He is currently pursuing a Ph.D. degree in mechanical and aerospace engineering at Seoul National University. His research interests include 3D robotics vision, visual odometry, and event camera.

Chungkeun Lee received his B.S. degree in mechanical and aerospace engineering in 2014. He is currently pursuing a Ph.D. degree in mechanical and aerospace engineering at Seoul National University. His research interests are robotic applications with deep-visual learning including perception and depth estimation.

Haram Kim received his B.S. degree in electronic engineering and an M.S. degree in mechanical and aerospace engineering from Seoul National University, in 2017 and 2019, respectively. He is currently pursuing a Ph.D. degree in mechanical and aerospace engineering at Seoul National University. His research interests include 3D robotics vision, visual odometry, and event camera.

H. Jin Kim received her B.S. degree from Korea Advanced Institute of Technology (KAIST) in 1995, and her M.S. and Ph.D. degrees in mechanical engineering from University of California, Berkeley, in 1999 and 2001, respectively. From 2002 to 2004, she was a Postdoctoral Researcher in electrical engineering and computer science, UC Berkeley. In 2004, she joined the Department of Mechanical and Aerospace Engineering at Seoul National University as an Assistant Professor, where she is currently a Professor. Her research interests include intelligent control of robotic systems and motion planning.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S., Lee, C., Kim, H. et al. Realtime Object-aware Monocular Depth Estimation in Onboard Systems. Int. J. Control Autom. Syst. 19, 3179–3189 (2021). https://doi.org/10.1007/s12555-020-0654-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12555-020-0654-8

Keywords

Navigation