Skip to main content
Log in

Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System

  • Regular Papers
  • Published:
International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Abstract

Depth map has been utilized for refinement of geometric information in a variety of fields such as 3D reconstruction and pose estimation in SLAM system where ill-posed problems are occurred. Currently, as learning-based approaches are successfully introduced throughout many problems of vision-based fields, several depth estimation algorithms based on CNN are suggested, which only conduct training of spatial information. Since an image sequence or video used for SLAM system tends to have temporal information, this paper proposes a recurrent CNN architecture for SLAM system to estimate depth map by exploring not only spatial but also temporal information by using convolutional GRU cell, which is constructed to remember weights of past convolutional layers. Furthermore, this paper proposes using additional layers that preserve structure of scenes by utilizing sparse depth cues obtained from SLAM system. The sparse depth cues are produced by projecting reconstructed 3D map into each camera frame, and the sparse cues help to predict accurate depth map avoiding ambiguity of depth map generation of untrained structures in latent space. Despite accuracy of depth cues according to monocular SLAM system degrades than stereo SLAM system, the proposed masking approach, which takes the confidence of depth cues with regard to a relative camera pose between current frame and previous frame, retains the performance of the proposed system with the proposed adaptive regularization in loss function. In the training phase, by preprocessing exponential quantization of ground-truth depth map to eliminate the ill-effects of the captured large distances, the depth map prediction of the proposed system improves more than other baseline methods with accomplishment of real-time system. We expect that this proposed system can be used in SLAM system to refine geometric information for more accurate 3D reconstruction and pose estimation, which are essential parts for robust navigation system of robots.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. G. J. Moon and Q. Zhihua, “An autonomous underwater vehicle as an underwater glider and its depth control,” International Journal of Control, Automation, and Systems, vol. 13, no. 5, pp.1212–1220, 2015.

    Article  Google Scholar 

  2. N. Metni and T. Hamel, “Visual tracking control of aerial robotic systems with adaptive depth estimation,” International Journal of Control, Automation, and Systems, vol. 5, no. 1, pp.51–60, 2007.

    Google Scholar 

  3. A. Torralba and A. Oliva, “Depth estimation from image structure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1226–1238, 2002.

    Article  Google Scholar 

  4. D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in Neural Information Processing Systems, pp. 2366–2374, 2014.

  5. H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011, 2018.

  6. R. Mur-Artal and J. D. Tardós, “ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.

    Article  Google Scholar 

  7. J. Engel, T. Schöps, and D. Cremers, “LSD-SLAM: large-scale direct monocular SLAM,” Proc. of European Conference on Computer Vision, pp. 834–849, September 2014.

  8. A. CS. Kumar, S. M. Bhandarkar, and M. Prasad, “Depthnet: a recurrent neural network architecture for monocular depth prediction,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 283–291, 2018.

  9. S. H. I. Xingjian, Z. Chen, H. Wang, H., D. Y. Yeung, W. K. Wong, and W. C. Woo, “Convolutional LSTM network: a machine learning approach for precipitation nowcasting,” Advances in Neural Information Processing Systems, pp. 802–810, 2015.

  10. K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoderdecoder approaches,” Proc. of Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014.

  11. S. Hochreiter and J. Schmidhuber, “LSTM can solve hard long time lag problems,” Advances in Neural Information Processing Systems, pp. 473–479, 1997.

  12. M. Siam, S. Valipour, M. Jagersand, and N. Ray, “Convolutional gated recurrent networks for video segmentation,” Proc. of IEEE International Conference on Image Processing (ICIP), pp. 3090–3094, September, 2017.

  13. O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” Proc. International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241, October, 2015.

  14. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2nd ed, New York, NY, USA, 2003.

    MATH  Google Scholar 

  15. G. Younes, D. Asmar, E. Shammas, and J. Zelek, “Keyframe-based monocular SLAM: design, survey, and future directions,” Robotics and Autonomous Systems, vol. 98, pp. 67–88, 2017.

    Article  Google Scholar 

  16. N. K. Ibragimov, Elementary Lie Group Analysis and Ordinary Differential Equations, vol. 197, New York, Wiley, 1999.

    MATH  Google Scholar 

  17. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361, June 2012.

  18. A. Saxena, M. Sun, and A. Y. Ng, “Make3d: learning 3d scene structure from a single still image,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 824–840, 2009.

    Article  Google Scholar 

  19. B. Liu, S. Gould, and D. Koller, “Single image depth estimation from predicted semantic labels,” Proc. of Conference on Computer Vision and Pattern Recognition, 2010.

  20. J. Johnson, Report of CNN Resnet Speed, https://github.com/jcjohnson/cnn-benchmarks, September 2017.

  21. F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to-dense: self-supervised depth completion from lidar and monocular camera,” Proc. of IEEE International Conference on Robotics and Automation, 2019.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sung Soo Hwang.

Additional information

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Recommended by Associate Editor Hyun Myung under the direction of Editor Jessie (Ju H.) Park. This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education [No. 2016R1D1A3B03934808].

Sang Jun Lee received his B.S. degree in Computer science and Engineering from Handong Global University, Pohang-si, Korea, in 2017. He is currently pursuing an M.S. degree in the Dept. of Information Technology at the Handong Global University. His research interests include the SLAM system for the localization of self-driving cars, robotics, or users who use augmented reality, virtual reality, as well as 3D reconstruction, and optimization of these technologies using machine learning.

Heeyoul Choi is an assistant professor at Handong Global University. He was a visiting researcher in MILA at University of Montreal from 2015 to 2016. He worked for Samsung Advanced Institute of Technology for 5 years, and was a post-doctoral researcher in Psychological and Brain Science at Indiana University, Indiana, from 2010 to 2011. He received his B.S. and M.S. degrees from Pohang University of Science and Technology, Korea, in 2002 and 2005, respectively, and his Ph.D. degree from Texas A&M University, Texas, in 2010. His research interests cover deep learning and cognitive science.

Sung Soo Hwang received his B.S. degree in Electrical Engineering and Computer Science from Handong Global Unveristy, Pohang, Korea in 2008, and his M.S and Ph.D. degrees in Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2010 and 2015, respectively. His research interests include image-based 3D modeling, 3D data compression, augmented reality, and Simultaneous Localization and Mapping system.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S.J., Choi, H. & Hwang, S.S. Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System. Int. J. Control Autom. Syst. 18, 206–216 (2020). https://doi.org/10.1007/s12555-019-0350-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12555-019-0350-8

Keywords

Navigation