Skip to main content

Building 3D Virtual Worlds from Monocular Images of Urban Road Traffic Scenes

  • Conference paper
  • First Online:
Advances in Visual Computing (ISVC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13018))

Included in the following conference series:

Abstract

Three-dimensional (3D) modeling of urban road scenes has warranted well-deserved interest in entertainment, urban planning, and autonomous vehicle simulation. Modeling such realistic scenes is still predominantly a manual process, relying mainly on 3D artists. Cameras mounted on vehicles can now provide images of road scenes, which can be used as references for automating scene layout. Our goal is to use the information from such images from a single camera sensor on a moving vehicle to build an approximate 3D virtual world. We propose a workflow that takes the human out of the loop through the use of deep learning to generate a dense depth map, an inverse projection to correct for perspective distortion in the image, collision detection, and a rendering engine. The engine loads and displays 3D models belonging to a particular type, at accurate relative positions, thus building and rendering a virtual world corresponding to the image. This virtual world can then be edited and animated. Our proposed workflow can potentially speed up the process of modeling virtual environments significantly when integrated with a modeling tool. We have tested the efficacy of our 3D virtual world-building and rendering using user studies with image-to-image similarity and video-to-image correspondences. Even with limited photorealistic rendering, our user study results demonstrate that 3D world-building can be effectively done, with minimal human intervention, using our workflow with monocular images from moving vehicles as inputs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We refer to the axes in image space as X-Y, and those in 3D world space as X-Y-Z.

References

  1. Ahlberg, S., Söderman, U., Elmqvist, M., Persson, A.: On modelling and visualisation of high resolution virtual environments using LIDAR data. In: Proceedings of the 12th International Conference on Geoinformatics, pp. 299–306 (2004)

    Google Scholar 

  2. Beardsley, P., Torr, P., Zisserman, A.: 3D model acquisition from extended image sequences. In: Buxton, B., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1065, pp. 683–695. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61123-1_181

    Chapter  Google Scholar 

  3. Bullinger, S.: Image-based 3D Reconstruction of Dynamic Objects Using Instance-aware Multibody Structure from Motion, vol. 44. KIT Scientific Publishing (2020)

    Google Scholar 

  4. Cabon, Y., Murray, N., Humenberger, M.: Virtual KITTI 2. arXiv preprint arXiv:2001.10773 (2020)

  5. Clarke, M.P.: Virtual World Construction, US Patent 9,378,296 (28 June 2016)

    Google Scholar 

  6. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223. IEEE (2016)

    Google Scholar 

  7. Frahm, J.-M., et al.: Building Rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_27

    Chapter  Google Scholar 

  8. Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4340–4349. IEEE (2016)

    Google Scholar 

  9. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  10. Gliem, J.A., Gliem, R.R.: Calculating, interpreting, and reporting Cronbach’s alpha reliability coefficient for Likert-type scales. In: Midwest Research-to-Practice Conference in Adult, Continuing, and Community (2003)

    Google Scholar 

  11. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 270–279. IEEE (2017)

    Google Scholar 

  12. Groenendijk, R., Karaoglu, S., Gevers, T., Mensink, T.: On the benefit of adversarial training for monocular depth estimation. Comput. Vis. Image Underst. 190, 102848 (2020)

    Article  Google Scholar 

  13. Hane, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D scene reconstruction and class segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 97–104. IEEE (2013)

    Google Scholar 

  14. He, L., Lu, J., Wang, G., Song, S., Zhou, J.: SOSD-Net: joint semantic object segmentation and depth estimation from monocular images. Neurocomputing 440, 251–263 (2021)

    Article  Google Scholar 

  15. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134. IEEE (2017)

    Google Scholar 

  16. Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 703–718. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_45

    Chapter  Google Scholar 

  17. Lempitsky, V., Boykov, Y.: Global optimization for shape fitting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2007)

    Google Scholar 

  18. Mohr, R., Quan, L., Veillon, F.: Relative 3D reconstruction using multiple uncalibrated images. Int. J. Robot. Res. 14(6), 619–632 (1995)

    Article  Google Scholar 

  19. Nidhi, K., Paddock, S.M.: Driving to safety: how many miles of driving would it take to demonstrate autonomous vehicle reliability? Transp. Res. Part A Policy Pract. 94, 182–193 (2016)

    Article  Google Scholar 

  20. Pollefeys, M., et al.: Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vis. 78(2–3), 143–167 (2008)

    Article  Google Scholar 

  21. Rander, P., Narayanan, P., Kanade, T.: Virtualized reality: constructing time-varying virtual worlds from real world events. In: Proceedings of 8th IEEE Visualization Conference, pp. 277–284. IEEE (1997)

    Google Scholar 

  22. Shreiner, D.: OpenGL Reference Manual: The Official Reference Document to OpenGL, version 1.2. Addison-Wesley Longman Publishing Co., Inc. (1999)

    Google Scholar 

  23. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. (TOG) 25, 835–846 (2006)

    Google Scholar 

  24. Venkateshkumar, S.K., Sridhar, M., Ott, P.: Latent hierarchical part based models for road scene understanding. In: Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV) Workshop, pp. 115–123. IEEE (2015)

    Google Scholar 

  25. Wang, T.C., et al.: Video-to-video synthesis. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS), pp. 1152–1164 (2018). http://dl.acm.org/citation.cfm?id=3326943.3327049

Download references

Acknowledgments

The authors are grateful to all members of the Graphics-Visualization-Computing Lab and peers at the IIITB for their support. This work has been financially supported by the Machine Intelligence and Robotics (MINRO) grant by the Government of Karnataka.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaya Sreevalsan-Nair .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Victor, A.C., Sreevalsan-Nair, J. (2021). Building 3D Virtual Worlds from Monocular Images of Urban Road Traffic Scenes. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2021. Lecture Notes in Computer Science(), vol 13018. Springer, Cham. https://doi.org/10.1007/978-3-030-90436-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-90436-4_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-90435-7

  • Online ISBN: 978-3-030-90436-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics