Real-Time Approach to Neural Network-Based Disparity Map Generation from Stereo Images

Kasatkin, Nikita; Yudin, Dmitry

doi:10.1007/978-3-030-91581-0_35

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1008))

Included in the following conference series:

International Conference on Neuroinformatics

471 Accesses
1 Citations

Abstract

Generating depth maps using mono- or stereo- images is a topic of active research. This paper is dedicated to study of different methods of depth and disparity maps generating. It includes analysis of existing methods of depth maps generating and investigation and improvement of the real-time neural-network based method AnyNet. Our approach AnyNet-M related to the models which are make prediction stage-by-stage and increase quality of disparity estimation over time. Firstly, proposed method was tested on KITTI Stereo Dataset and custom OpenTaganrog dataset with images of 1242 × 375 resolution. The method was proved to be real-time with approximately 38 FPS using one TeslaV100 GPU. Secondly, quality of proposed model was tested and reached 5% of Average 3-Pixel Error on KITTI dataset. Finally, it was integrated to Robotic operation system for further use as part of the navigation systems of unmanned vehicles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bokovoy, A., Muravyev, K., Yakovlev, K.: Real-time vision-based depth reconstruction with Nvidia Jetson. In: 2019 European Conference on Mobile Robots (ECMR), pp. 1–6 (2019)
Google Scholar
Lee, W., Park, N., Woo, W.: Depth-assisted real-time 3D object detection for augmented reality. ICAT 11(2), 126–132 (2011)
Google Scholar
Moreno-Noguer, F., Belhumeur, P.N., Nayar, S.K.: Active refocusing of images and videos. ACM Trans. Graph. 26, 67 (2007). https://doi.org/10.1145/1276377.1276461
Article Google Scholar
Staroverov, A., Yudin, D.A., Belkin, I., Adeshkin, V., Solomentsev, Y.K., Panov, A.I.: Real-time object navigation with deep neural networks and hierarchical reinforcement learning. IEEE Access 8, 195608–195621 (2020)
Article Google Scholar
Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10111, pp. 213–228. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54181-5_14
Chapter Google Scholar
Spangenberg, R., Langner, T., Adfeldt, S., Rojas, R.: Large scale semi-global matching on the CPU. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings, pp. 195–201. IEEE (2014)
Google Scholar
Hernandez-Juarez, D., Chacón, A., Espinosa, A., Vázquez, D., Moure, J.C., López, A.M.: Embedded real-time stereo estimation via semi-global matching on the GPU. Procedia Comput. Sci. 80, 143–153 (2016)
Article Google Scholar
Wang, Y., et al.: Anytime stereo image depth estimation on mobile devices. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5893–5900. IEEE (2019)
Google Scholar
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2018)
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)
Google Scholar
Godard, C., Aodha, O.M., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)
Google Scholar
Fonder, M., Ernst, D., Van Droogenbroeck, M.: M4Depth: a motion-based approach for monocular depth estimation on video sequences. arXiv preprint arXiv:2105.09847 (2021)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. arXiv preprint arXiv:2103.13413 (2021)
Wang, Y., et al.: Anytime stereo image depth estimation on mobile devices. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5893–5900. IEEE (2019)
Google Scholar
Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: Ga-net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 185–194 (2019)
Google Scholar
Xu, H., Zhang, J.: Aanet: adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1959–1968 (2020)
Google Scholar
Chen, Y., Liu, S., Shen, X., Jia, J.: Dsgn: Deep stereo geometry network for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12536–12545 (2020)
Google Scholar
Couprie, C., Farabet, C., Najman, L., LeCun, Y.: Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572 (2013)
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)
Google Scholar
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
Li, Z., Snavely, N.: Megadepth: learning single-view depth prediction from internet photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2041–2050 (2018)
Google Scholar
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. arXiv preprint arXiv:1907.01341 (2019)
Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.H., Kautz, J.: Learning affinity via spatial propagation networks. arXiv preprint arXiv:1710.01020 (2017)
Shepel, I., Adeshkin, V., Belkin, I., Yudin, D.: Occupancy Grid Generation with Dynamic Obstacle Segmentation in Stereo Images (2021)
Google Scholar

Download references

Acknowledgments

This work was supported in part of theoretical investigation and methodology by the Russian Science Foundation under Grant 20–71-10116. Also, the authors are grateful to Integrant LLC for consulting and providing a dataset obtained using a real robot for experimental research.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, National Research University, Institutsky Per. 9, Moscow Region, Dolgoprudny, 141700, Russia
Nikita Kasatkin & Dmitry Yudin

Authors

Nikita Kasatkin
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Yudin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikita Kasatkin .

Editor information

Editors and Affiliations

Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Boris Kryzhanovsky
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Witali Dunin-Barkowski
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Vladimir Redko
Moscow Aviation Institute (National Research University), Moscow, Russia
Yury Tiumentsev
MEPhI, National Research Nuclear University, Moscow, Russia
Valentin V. Klimov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kasatkin, N., Yudin, D. (2022). Real-Time Approach to Neural Network-Based Disparity Map Generation from Stereo Images. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y., Klimov, V.V. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research V. NEUROINFORMATICS 2021. Studies in Computational Intelligence, vol 1008. Springer, Cham. https://doi.org/10.1007/978-3-030-91581-0_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-91581-0_35
Published: 23 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91580-3
Online ISBN: 978-3-030-91581-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics