Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System

Lee, Sang Jun; Choi, Heeyoul; Hwang, Sung Soo

doi:10.1007/s12555-019-0350-8

Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System

Regular Papers
Published: 23 September 2019

Volume 18, pages 206–216, (2020)
Cite this article

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

649 Accesses
28 Citations
3 Altmetric
Explore all metrics

Abstract

Depth map has been utilized for refinement of geometric information in a variety of fields such as 3D reconstruction and pose estimation in SLAM system where ill-posed problems are occurred. Currently, as learning-based approaches are successfully introduced throughout many problems of vision-based fields, several depth estimation algorithms based on CNN are suggested, which only conduct training of spatial information. Since an image sequence or video used for SLAM system tends to have temporal information, this paper proposes a recurrent CNN architecture for SLAM system to estimate depth map by exploring not only spatial but also temporal information by using convolutional GRU cell, which is constructed to remember weights of past convolutional layers. Furthermore, this paper proposes using additional layers that preserve structure of scenes by utilizing sparse depth cues obtained from SLAM system. The sparse depth cues are produced by projecting reconstructed 3D map into each camera frame, and the sparse cues help to predict accurate depth map avoiding ambiguity of depth map generation of untrained structures in latent space. Despite accuracy of depth cues according to monocular SLAM system degrades than stereo SLAM system, the proposed masking approach, which takes the confidence of depth cues with regard to a relative camera pose between current frame and previous frame, retains the performance of the proposed system with the proposed adaptive regularization in loss function. In the training phase, by preprocessing exponential quantization of ground-truth depth map to eliminate the ill-effects of the captured large distances, the depth map prediction of the proposed system improves more than other baseline methods with accomplishment of real-time system. We expect that this proposed system can be used in SLAM system to refine geometric information for more accurate 3D reconstruction and pose estimation, which are essential parts for robust navigation system of robots.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monocular Dense SLAM with Consistent Deep Depth Prediction

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

Article 08 September 2018

Pseudo RGB-D for Self-improving Monocular SLAM and Depth Prediction

References

G. J. Moon and Q. Zhihua, “An autonomous underwater vehicle as an underwater glider and its depth control,” International Journal of Control, Automation, and Systems, vol. 13, no. 5, pp.1212–1220, 2015.
Article Google Scholar
N. Metni and T. Hamel, “Visual tracking control of aerial robotic systems with adaptive depth estimation,” International Journal of Control, Automation, and Systems, vol. 5, no. 1, pp.51–60, 2007.
Google Scholar
A. Torralba and A. Oliva, “Depth estimation from image structure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1226–1238, 2002.
Article Google Scholar
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in Neural Information Processing Systems, pp. 2366–2374, 2014.
H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011, 2018.
R. Mur-Artal and J. D. Tardós, “ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
Article Google Scholar
J. Engel, T. Schöps, and D. Cremers, “LSD-SLAM: large-scale direct monocular SLAM,” Proc. of European Conference on Computer Vision, pp. 834–849, September 2014.
A. CS. Kumar, S. M. Bhandarkar, and M. Prasad, “Depthnet: a recurrent neural network architecture for monocular depth prediction,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 283–291, 2018.
S. H. I. Xingjian, Z. Chen, H. Wang, H., D. Y. Yeung, W. K. Wong, and W. C. Woo, “Convolutional LSTM network: a machine learning approach for precipitation nowcasting,” Advances in Neural Information Processing Systems, pp. 802–810, 2015.
K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoderdecoder approaches,” Proc. of Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014.
S. Hochreiter and J. Schmidhuber, “LSTM can solve hard long time lag problems,” Advances in Neural Information Processing Systems, pp. 473–479, 1997.
M. Siam, S. Valipour, M. Jagersand, and N. Ray, “Convolutional gated recurrent networks for video segmentation,” Proc. of IEEE International Conference on Image Processing (ICIP), pp. 3090–3094, September, 2017.
O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” Proc. International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241, October, 2015.
R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2nd ed, New York, NY, USA, 2003.
MATH Google Scholar
G. Younes, D. Asmar, E. Shammas, and J. Zelek, “Keyframe-based monocular SLAM: design, survey, and future directions,” Robotics and Autonomous Systems, vol. 98, pp. 67–88, 2017.
Article Google Scholar
N. K. Ibragimov, Elementary Lie Group Analysis and Ordinary Differential Equations, vol. 197, New York, Wiley, 1999.
MATH Google Scholar
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361, June 2012.
A. Saxena, M. Sun, and A. Y. Ng, “Make3d: learning 3d scene structure from a single still image,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 824–840, 2009.
Article Google Scholar
B. Liu, S. Gould, and D. Koller, “Single image depth estimation from predicted semantic labels,” Proc. of Conference on Computer Vision and Pattern Recognition, 2010.
J. Johnson, Report of CNN Resnet Speed, https://github.com/jcjohnson/cnn-benchmarks, September 2017.
F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to-dense: self-supervised depth completion from lidar and monocular camera,” Proc. of IEEE International Conference on Robotics and Automation, 2019.

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Handong Global University, 558, Handong-ro, Heunghae-eup, Buk-gu, Pohang-si, KS010, Korea
Sang Jun Lee, Heeyoul Choi & Sung Soo Hwang

Authors

Sang Jun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Heeyoul Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sung Soo Hwang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sung Soo Hwang.

Additional information

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Recommended by Associate Editor Hyun Myung under the direction of Editor Jessie (Ju H.) Park. This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education [No. 2016R1D1A3B03934808].

Sang Jun Lee received his B.S. degree in Computer science and Engineering from Handong Global University, Pohang-si, Korea, in 2017. He is currently pursuing an M.S. degree in the Dept. of Information Technology at the Handong Global University. His research interests include the SLAM system for the localization of self-driving cars, robotics, or users who use augmented reality, virtual reality, as well as 3D reconstruction, and optimization of these technologies using machine learning.

Heeyoul Choi is an assistant professor at Handong Global University. He was a visiting researcher in MILA at University of Montreal from 2015 to 2016. He worked for Samsung Advanced Institute of Technology for 5 years, and was a post-doctoral researcher in Psychological and Brain Science at Indiana University, Indiana, from 2010 to 2011. He received his B.S. and M.S. degrees from Pohang University of Science and Technology, Korea, in 2002 and 2005, respectively, and his Ph.D. degree from Texas A&M University, Texas, in 2010. His research interests cover deep learning and cognitive science.

Sung Soo Hwang received his B.S. degree in Electrical Engineering and Computer Science from Handong Global Unveristy, Pohang, Korea in 2008, and his M.S and Ph.D. degrees in Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2010 and 2015, respectively. His research interests include image-based 3D modeling, 3D data compression, augmented reality, and Simultaneous Localization and Mapping system.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S.J., Choi, H. & Hwang, S.S. Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System. Int. J. Control Autom. Syst. 18, 206–216 (2020). https://doi.org/10.1007/s12555-019-0350-8

Download citation

Received: 10 May 2019
Revised: 05 August 2019
Accepted: 05 August 2019
Published: 23 September 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s12555-019-0350-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System

Abstract

Access this article

Similar content being viewed by others

Monocular Dense SLAM with Consistent Deep Depth Prediction

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

Pseudo RGB-D for Self-improving Monocular SLAM and Depth Prediction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System

Abstract

Access this article

Similar content being viewed by others

Monocular Dense SLAM with Consistent Deep Depth Prediction

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

Pseudo RGB-D for Self-improving Monocular SLAM and Depth Prediction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation