International Journal of Computer Vision

, Volume 78, Issue 2, pp 121–141

3D Urban Scene Modeling Integrating Recognition and Reconstruction


    • KU Leuven
  • Bastian Leibe
    • ETH Zurich
  • Kurt Cornelis
    • KU Leuven
  • Luc Van Gool
    • KU Leuven
    • ETH Zurich

DOI: 10.1007/s11263-007-0081-9

Cite this article as:
Cornelis, N., Leibe, B., Cornelis, K. et al. Int J Comput Vis (2008) 78: 121. doi:10.1007/s11263-007-0081-9


Supplying realistically textured 3D city models at ground level promises to be useful for pre-visualizing upcoming traffic situations in car navigation systems. Because this pre-visualization can be rendered from the expected future viewpoints of the driver, the required maneuver will be more easily understandable. 3D city models can be reconstructed from the imagery recorded by surveying vehicles. The vastness of image material gathered by these vehicles, however, puts extreme demands on vision algorithms to ensure their practical usability. Algorithms need to be as fast as possible and should result in compact, memory efficient 3D city models for future ease of distribution and visualization. For the considered application, these are not contradictory demands. Simplified geometry assumptions can speed up vision algorithms while automatically guaranteeing compact geometry models. In this paper, we present a novel city modeling framework which builds upon this philosophy to create 3D content at high speed.

Objects in the environment, such as cars and pedestrians, may however disturb the reconstruction, as they violate the simplified geometry assumptions, leading to visually unpleasant artifacts and degrading the visual realism of the resulting 3D city model. Unfortunately, such objects are prevalent in urban scenes. We therefore extend the reconstruction framework by integrating it with an object recognition module that automatically detects cars in the input video streams and localizes them in 3D. The two components of our system are tightly integrated and benefit from each other’s continuous input. 3D reconstruction delivers geometric scene context, which greatly helps improve detection precision. The detected car locations, on the other hand, are used to instantiate virtual placeholder models which augment the visual realism of the reconstructed city model.


City modelingStructure from motion3D reconstructionObject detectionTemporal integration

Copyright information

© Springer Science+Business Media, LLC 2007