Building 3D Virtual Worlds from Monocular Images of Urban Road Traffic Scenes

Victor, Ankita Christine; Sreevalsan-Nair, Jaya

doi:10.1007/978-3-030-90436-4_37

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13018))

Included in the following conference series:

International Symposium on Visual Computing

987 Accesses
1 Citations

Abstract

Three-dimensional (3D) modeling of urban road scenes has warranted well-deserved interest in entertainment, urban planning, and autonomous vehicle simulation. Modeling such realistic scenes is still predominantly a manual process, relying mainly on 3D artists. Cameras mounted on vehicles can now provide images of road scenes, which can be used as references for automating scene layout. Our goal is to use the information from such images from a single camera sensor on a moving vehicle to build an approximate 3D virtual world. We propose a workflow that takes the human out of the loop through the use of deep learning to generate a dense depth map, an inverse projection to correct for perspective distortion in the image, collision detection, and a rendering engine. The engine loads and displays 3D models belonging to a particular type, at accurate relative positions, thus building and rendering a virtual world corresponding to the image. This virtual world can then be edited and animated. Our proposed workflow can potentially speed up the process of modeling virtual environments significantly when integrated with a modeling tool. We have tested the efficacy of our 3D virtual world-building and rendering using user studies with image-to-image similarity and video-to-image correspondences. Even with limited photorealistic rendering, our user study results demonstrate that 3D world-building can be effectively done, with minimal human intervention, using our workflow with monocular images from moving vehicles as inputs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We refer to the axes in image space as X-Y, and those in 3D world space as X-Y-Z.

References

Ahlberg, S., Söderman, U., Elmqvist, M., Persson, A.: On modelling and visualisation of high resolution virtual environments using LIDAR data. In: Proceedings of the 12th International Conference on Geoinformatics, pp. 299–306 (2004)
Google Scholar
Beardsley, P., Torr, P., Zisserman, A.: 3D model acquisition from extended image sequences. In: Buxton, B., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1065, pp. 683–695. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61123-1_181
Chapter Google Scholar
Bullinger, S.: Image-based 3D Reconstruction of Dynamic Objects Using Instance-aware Multibody Structure from Motion, vol. 44. KIT Scientific Publishing (2020)
Google Scholar
Cabon, Y., Murray, N., Humenberger, M.: Virtual KITTI 2. arXiv preprint arXiv:2001.10773 (2020)
Clarke, M.P.: Virtual World Construction, US Patent 9,378,296 (28 June 2016)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223. IEEE (2016)
Google Scholar
Frahm, J.-M., et al.: Building Rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_27
Chapter Google Scholar
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4340–4349. IEEE (2016)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Gliem, J.A., Gliem, R.R.: Calculating, interpreting, and reporting Cronbach’s alpha reliability coefficient for Likert-type scales. In: Midwest Research-to-Practice Conference in Adult, Continuing, and Community (2003)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 270–279. IEEE (2017)
Google Scholar
Groenendijk, R., Karaoglu, S., Gevers, T., Mensink, T.: On the benefit of adversarial training for monocular depth estimation. Comput. Vis. Image Underst. 190, 102848 (2020)
Article Google Scholar
Hane, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D scene reconstruction and class segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 97–104. IEEE (2013)
Google Scholar
He, L., Lu, J., Wang, G., Song, S., Zhou, J.: SOSD-Net: joint semantic object segmentation and depth estimation from monocular images. Neurocomputing 440, 251–263 (2021)
Article Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134. IEEE (2017)
Google Scholar
Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 703–718. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_45
Chapter Google Scholar
Lempitsky, V., Boykov, Y.: Global optimization for shape fitting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2007)
Google Scholar
Mohr, R., Quan, L., Veillon, F.: Relative 3D reconstruction using multiple uncalibrated images. Int. J. Robot. Res. 14(6), 619–632 (1995)
Article Google Scholar
Nidhi, K., Paddock, S.M.: Driving to safety: how many miles of driving would it take to demonstrate autonomous vehicle reliability? Transp. Res. Part A Policy Pract. 94, 182–193 (2016)
Article Google Scholar
Pollefeys, M., et al.: Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vis. 78(2–3), 143–167 (2008)
Article Google Scholar
Rander, P., Narayanan, P., Kanade, T.: Virtualized reality: constructing time-varying virtual worlds from real world events. In: Proceedings of 8th IEEE Visualization Conference, pp. 277–284. IEEE (1997)
Google Scholar
Shreiner, D.: OpenGL Reference Manual: The Official Reference Document to OpenGL, version 1.2. Addison-Wesley Longman Publishing Co., Inc. (1999)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. (TOG) 25, 835–846 (2006)
Google Scholar
Venkateshkumar, S.K., Sridhar, M., Ott, P.: Latent hierarchical part based models for road scene understanding. In: Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV) Workshop, pp. 115–123. IEEE (2015)
Google Scholar
Wang, T.C., et al.: Video-to-video synthesis. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS), pp. 1152–1164 (2018). http://dl.acm.org/citation.cfm?id=3326943.3327049

Download references

Acknowledgments

The authors are grateful to all members of the Graphics-Visualization-Computing Lab and peers at the IIITB for their support. This work has been financially supported by the Machine Intelligence and Robotics (MINRO) grant by the Government of Karnataka.

Author information

Authors and Affiliations

Graphics-Visualization-Computing Lab, International Institute of Information Technology, 26/C, Electronics City, Bangalore, 560100, Karnataka, India
Ankita Christine Victor & Jaya Sreevalsan-Nair

Authors

Ankita Christine Victor
View author publications
You can also search for this author in PubMed Google Scholar
Jaya Sreevalsan-Nair
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaya Sreevalsan-Nair .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
University of Texas at Arlington, Arlington, TX, USA
Vassilis Athitsos
University of South Carolina, Columbia, SC, USA
Tong Yan
City University of Hong Kong, Kowloon, Hong Kong
Manfred Lau
School of Engineering and Computing, University of Durham, Durham, Durham, UK
Frederick Li
Airbnb, New York, NY, USA
Conglei Shi
Peking University, Beijing, China
Xiaoru Yuan
Purdue University, West Lafayette, IN, USA
Christos Mousas
IST, School of Modeling, Simulation, and Training, Orlando, FL, USA
Gerd Bruder

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Victor, A.C., Sreevalsan-Nair, J. (2021). Building 3D Virtual Worlds from Monocular Images of Urban Road Traffic Scenes. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2021. Lecture Notes in Computer Science(), vol 13018. Springer, Cham. https://doi.org/10.1007/978-3-030-90436-4_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-90436-4_37
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90435-7
Online ISBN: 978-3-030-90436-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics