Skip to main content
Log in

3D Urban Scene Modeling Integrating Recognition and Reconstruction

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Supplying realistically textured 3D city models at ground level promises to be useful for pre-visualizing upcoming traffic situations in car navigation systems. Because this pre-visualization can be rendered from the expected future viewpoints of the driver, the required maneuver will be more easily understandable. 3D city models can be reconstructed from the imagery recorded by surveying vehicles. The vastness of image material gathered by these vehicles, however, puts extreme demands on vision algorithms to ensure their practical usability. Algorithms need to be as fast as possible and should result in compact, memory efficient 3D city models for future ease of distribution and visualization. For the considered application, these are not contradictory demands. Simplified geometry assumptions can speed up vision algorithms while automatically guaranteeing compact geometry models. In this paper, we present a novel city modeling framework which builds upon this philosophy to create 3D content at high speed.

Objects in the environment, such as cars and pedestrians, may however disturb the reconstruction, as they violate the simplified geometry assumptions, leading to visually unpleasant artifacts and degrading the visual realism of the resulting 3D city model. Unfortunately, such objects are prevalent in urban scenes. We therefore extend the reconstruction framework by integrating it with an object recognition module that automatically detects cars in the input video streams and localizes them in 3D. The two components of our system are tightly integrated and benefit from each other’s continuous input. 3D reconstruction delivers geometric scene context, which greatly helps improve detection precision. The detected car locations, on the other hand, are used to instantiate virtual placeholder models which augment the visual realism of the reconstructed city model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bay, H., Tuytelaars, T., & Gool, L. V. (2006). Surf: speeded-up robust features. In Ninth European conference on computer vision (ECCV’06).

  • Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.

    Article  Google Scholar 

  • Cornelis, N., & Gool, L. V. (2005). Real-time connectivity constrained depth map computation using programmable graphics hardware. In IEEE conference on computer vision and pattern recognition (CVPR’05).

  • Cornelis, N., Cornelis, K., & Gool, L. V. (2006a). Fast compact city modeling for navigation pre-visualization. In IEEE conference on computer vision and pattern recognition (CVPR’06).

  • Cornelis, N., Leibe, B., Cornelis, K., & Gool, L. V. (2006b). 3d city modeling using cognitive loops. In Third international symposium on 3D data processing, visualization, and transmission (3DPVT’06).

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and pattern recognition (CVPR’05).

  • Debevec, P. E., Yu, Y., & Borshukov, G. D. (1998). Efficient view-dependent image-based rendering with projective texture-mapping. In Eurographics rendering workshop (pp. 105–116), June 1998.

  • Dick, A., Torr, P., Ruffle, S., & Cipolla, R. (2001). Combining single view recognition and multiple view stereo for architectural scenes. In Eighth international conference on computer vision (ICCV’01).

  • Fischler, M., & Bolles, R. (1981). Random sampling consensus: a paradigm for model fitting with application to image analysis and automated cartography. Communications of the ACM, 24, 381–395.

    Article  MathSciNet  Google Scholar 

  • Frueh, C., & Zakhor, A. (2001). 3D model generation for cities using aerial photographs and ground level laser scans. In IEEE conference on computer vision and pattern recognition (CVPR’01) (pp. 31–38).

  • Frueh, C., Jain, S., & Zakhor, A. (2005). Data processing algorithms for generating textured 3D building facade meshes from laser scans and camera images. International Journal of Computer Vision, 61, 159–184.

    Article  Google Scholar 

  • Gruen, A. (1997). Automation in building reconstruction. In Fritsch & Hobbie (Eds.), Photogrammetric week’97, Stuttgart.

  • Haala, N., & Brenner, C. (1998). Fast production of virtual reality city models. International Archives of Photogrammetry and Remote Sensing, 32, 77–84.

    Google Scholar 

  • Haala, N., Brenner, C., & Stätter, C. (1998). An integrated system for urban model generation. In Proceedings ISPRS (pp. 96–103), Cambridge.

  • Haralick, R., Joo, H., Lee, C., Zhuang, X., Vaidya, V., & Kim, M. (1989). Pose estimation from corresponding point data. IEEE Transactions on Systems, Man and Cybernetics, 19(6), 1426–1446.

    Article  Google Scholar 

  • Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Hoiem, D., Efros, A., & Hebert, M. (2006). Putting objects into perspective. In IEEE conference on computer vision and pattern recognition (CVPR’06).

  • Hu, J., You, S., & Neumann, U. (2003). Approaches to large-scale urban modeling. IEEE Computer Graphics & Applications, 23(6), 62–69.

    Article  Google Scholar 

  • Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In IEEE conference on computer vision and pattern recognition (CVPR’05).

  • Leibe, B., Mikolajczyk, K., & Schiele, B. (2006). Segmentation based multi-cue integration for object detection. In British machine vision conference (BMVC’06), Edinburgh, UK, September 2006.

  • Leibe, B., Cornelis, N., Cornelis, K., & Van Gool, L. (2006). Integrating recognition and reconstruction for cognitive traffic scene analysis from a moving vehicle. In Lecture notes in computer science : Vol. 4174. DAGM’06 annual pattern recognition symposium ( pp. 192–201). Berlin: Springer.

    Chapter  Google Scholar 

  • Leibe, B., Cornelis, N., Cornelis, K., & Van Gool, L. (2007a). Dynamic 3d scene analysis from a moving vehicle. In IEEE conference on computer vision and pattern recognition (CVPR’07).

  • Leibe, B., Leonardis, A., & Schiele, B. (2007b, to appear). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision.

  • Leonardis, A., Gupta, A., & Bajcsy, R. (1995). Segmentation of range images as the search for geometric parametric models. International Journal of Computer Vision, 14, 253–277.

    Article  Google Scholar 

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Maas, H.-G. (2001). The suitability of airborne laser scanner data for automatic 3D object reconstruction. In International workshop on automatic extraction of man-made objects from aerial and space images.

  • Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 31–37.

    Article  Google Scholar 

  • Mikolajczyk, K., Leibe, B., & Schiele, B. (2006). Multiple object class detection with a generative model. In IEEE conference on computer vision and pattern recognition (CVPR’06).

  • Nister, D. (2003). An efficient solution to the five-point relative pose problem. In IEEE conference on computer vision and pattern recognition (CVPR’03) (pp. 195–202).

  • Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). LabelMe: a database and web-based tool for image anotation. MIT AI Lab Memo AIM-2005-025, September 2005. http://labelme.csail.mit.edu/.

  • Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47, 7–42.

    Article  MATH  Google Scholar 

  • Stamos, I., & Allen, P. K. (2000). 3D model construction using range and image data. In IEEE conference on computer vision and pattern recognition (CVPR’00).

  • Sudderth, E., Torralba, A., Freeman, W., & Wilsky, A. (2005). Learning hierarchical models of scenes, objects, and parts. In Tenth international conference on computer vision (ICCV’05).

  • Sun, Y., Paik, J. K., Koschan, A., & Abidi, M. A. (2002). 3D reconstruction of indoor and outdoor scenes using a mobile range scanner. In International conference on pattern recognition (ICPR’02).

  • Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: Efficient boosting procedures for multiclass object detection. In IEEE conference on computer vision and pattern recognition (CVPR’04).

  • Veksler, O. (2003). Fast variable window for stereo correspondence using integral images. In IEEE conference on computer vision and pattern recognition (CVPR’03) (pp. 556–564).

  • Vestri, C., & Devernay, F. (2001). Using robust methods for automatic extraction of buildings. In IEEE conference on computer vision and pattern recognition (CVPR’01).

  • Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.

    Article  Google Scholar 

  • Vosselman, G., & Dijkman, S. (2001). 3D building model reconstruction from point clouds and ground plans (34-3/W4:22–24).

  • Wolf, M. (1999). Photogrammetric data capture and calculation for 3D city models. In Photogrammetric week’99 (pp. 305–312).

  • Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In Tenth international conference on computer vision (ICCV’05).

  • Yang, R., & Pollefeys, M. (2003). Multi-resolution real-time stereo on commodity graphics hardware. In IEEE conference on computer vision and pattern recognition (CVPR’03).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nico Cornelis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cornelis, N., Leibe, B., Cornelis, K. et al. 3D Urban Scene Modeling Integrating Recognition and Reconstruction. Int J Comput Vis 78, 121–141 (2008). https://doi.org/10.1007/s11263-007-0081-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-007-0081-9

Keywords

Navigation