3D Urban Scene Modeling Integrating Recognition and Reconstruction

  • Nico Cornelis
  • Bastian Leibe
  • Kurt Cornelis
  • Luc Van Gool
Article

Abstract

Supplying realistically textured 3D city models at ground level promises to be useful for pre-visualizing upcoming traffic situations in car navigation systems. Because this pre-visualization can be rendered from the expected future viewpoints of the driver, the required maneuver will be more easily understandable. 3D city models can be reconstructed from the imagery recorded by surveying vehicles. The vastness of image material gathered by these vehicles, however, puts extreme demands on vision algorithms to ensure their practical usability. Algorithms need to be as fast as possible and should result in compact, memory efficient 3D city models for future ease of distribution and visualization. For the considered application, these are not contradictory demands. Simplified geometry assumptions can speed up vision algorithms while automatically guaranteeing compact geometry models. In this paper, we present a novel city modeling framework which builds upon this philosophy to create 3D content at high speed.

Objects in the environment, such as cars and pedestrians, may however disturb the reconstruction, as they violate the simplified geometry assumptions, leading to visually unpleasant artifacts and degrading the visual realism of the resulting 3D city model. Unfortunately, such objects are prevalent in urban scenes. We therefore extend the reconstruction framework by integrating it with an object recognition module that automatically detects cars in the input video streams and localizes them in 3D. The two components of our system are tightly integrated and benefit from each other’s continuous input. 3D reconstruction delivers geometric scene context, which greatly helps improve detection precision. The detected car locations, on the other hand, are used to instantiate virtual placeholder models which augment the visual realism of the reconstructed city model.

Keywords

City modeling Structure from motion 3D reconstruction Object detection Temporal integration 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bay, H., Tuytelaars, T., & Gool, L. V. (2006). Surf: speeded-up robust features. In Ninth European conference on computer vision (ECCV’06). Google Scholar
  2. Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619. CrossRefGoogle Scholar
  3. Cornelis, N., & Gool, L. V. (2005). Real-time connectivity constrained depth map computation using programmable graphics hardware. In IEEE conference on computer vision and pattern recognition (CVPR’05). Google Scholar
  4. Cornelis, N., Cornelis, K., & Gool, L. V. (2006a). Fast compact city modeling for navigation pre-visualization. In IEEE conference on computer vision and pattern recognition (CVPR’06). Google Scholar
  5. Cornelis, N., Leibe, B., Cornelis, K., & Gool, L. V. (2006b). 3d city modeling using cognitive loops. In Third international symposium on 3D data processing, visualization, and transmission (3DPVT’06). Google Scholar
  6. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and pattern recognition (CVPR’05). Google Scholar
  7. Debevec, P. E., Yu, Y., & Borshukov, G. D. (1998). Efficient view-dependent image-based rendering with projective texture-mapping. In Eurographics rendering workshop (pp. 105–116), June 1998. Google Scholar
  8. Dick, A., Torr, P., Ruffle, S., & Cipolla, R. (2001). Combining single view recognition and multiple view stereo for architectural scenes. In Eighth international conference on computer vision (ICCV’01). Google Scholar
  9. Fischler, M., & Bolles, R. (1981). Random sampling consensus: a paradigm for model fitting with application to image analysis and automated cartography. Communications of the ACM, 24, 381–395. CrossRefMathSciNetGoogle Scholar
  10. Frueh, C., & Zakhor, A. (2001). 3D model generation for cities using aerial photographs and ground level laser scans. In IEEE conference on computer vision and pattern recognition (CVPR’01) (pp. 31–38). Google Scholar
  11. Frueh, C., Jain, S., & Zakhor, A. (2005). Data processing algorithms for generating textured 3D building facade meshes from laser scans and camera images. International Journal of Computer Vision, 61, 159–184. CrossRefGoogle Scholar
  12. Gruen, A. (1997). Automation in building reconstruction. In Fritsch & Hobbie (Eds.), Photogrammetric week’97, Stuttgart. Google Scholar
  13. Haala, N., & Brenner, C. (1998). Fast production of virtual reality city models. International Archives of Photogrammetry and Remote Sensing, 32, 77–84. Google Scholar
  14. Haala, N., Brenner, C., & Stätter, C. (1998). An integrated system for urban model generation. In Proceedings ISPRS (pp. 96–103), Cambridge. Google Scholar
  15. Haralick, R., Joo, H., Lee, C., Zhuang, X., Vaidya, V., & Kim, M. (1989). Pose estimation from corresponding point data. IEEE Transactions on Systems, Man and Cybernetics, 19(6), 1426–1446. CrossRefGoogle Scholar
  16. Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge: Cambridge University Press. MATHGoogle Scholar
  17. Hoiem, D., Efros, A., & Hebert, M. (2006). Putting objects into perspective. In IEEE conference on computer vision and pattern recognition (CVPR’06). Google Scholar
  18. Hu, J., You, S., & Neumann, U. (2003). Approaches to large-scale urban modeling. IEEE Computer Graphics & Applications, 23(6), 62–69. CrossRefGoogle Scholar
  19. Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In IEEE conference on computer vision and pattern recognition (CVPR’05). Google Scholar
  20. Leibe, B., Mikolajczyk, K., & Schiele, B. (2006). Segmentation based multi-cue integration for object detection. In British machine vision conference (BMVC’06), Edinburgh, UK, September 2006. Google Scholar
  21. Leibe, B., Cornelis, N., Cornelis, K., & Van Gool, L. (2006). Integrating recognition and reconstruction for cognitive traffic scene analysis from a moving vehicle. In Lecture notes in computer science : Vol. 4174. DAGM’06 annual pattern recognition symposium ( pp. 192–201). Berlin: Springer. CrossRefGoogle Scholar
  22. Leibe, B., Cornelis, N., Cornelis, K., & Van Gool, L. (2007a). Dynamic 3d scene analysis from a moving vehicle. In IEEE conference on computer vision and pattern recognition (CVPR’07). Google Scholar
  23. Leibe, B., Leonardis, A., & Schiele, B. (2007b, to appear). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision. Google Scholar
  24. Leonardis, A., Gupta, A., & Bajcsy, R. (1995). Segmentation of range images as the search for geometric parametric models. International Journal of Computer Vision, 14, 253–277. CrossRefGoogle Scholar
  25. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  26. Maas, H.-G. (2001). The suitability of airborne laser scanner data for automatic 3D object reconstruction. In International workshop on automatic extraction of man-made objects from aerial and space images. Google Scholar
  27. Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 31–37. CrossRefGoogle Scholar
  28. Mikolajczyk, K., Leibe, B., & Schiele, B. (2006). Multiple object class detection with a generative model. In IEEE conference on computer vision and pattern recognition (CVPR’06). Google Scholar
  29. Nister, D. (2003). An efficient solution to the five-point relative pose problem. In IEEE conference on computer vision and pattern recognition (CVPR’03) (pp. 195–202). Google Scholar
  30. Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). LabelMe: a database and web-based tool for image anotation. MIT AI Lab Memo AIM-2005-025, September 2005. http://labelme.csail.mit.edu/.
  31. Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47, 7–42. MATHCrossRefGoogle Scholar
  32. Stamos, I., & Allen, P. K. (2000). 3D model construction using range and image data. In IEEE conference on computer vision and pattern recognition (CVPR’00). Google Scholar
  33. Sudderth, E., Torralba, A., Freeman, W., & Wilsky, A. (2005). Learning hierarchical models of scenes, objects, and parts. In Tenth international conference on computer vision (ICCV’05). Google Scholar
  34. Sun, Y., Paik, J. K., Koschan, A., & Abidi, M. A. (2002). 3D reconstruction of indoor and outdoor scenes using a mobile range scanner. In International conference on pattern recognition (ICPR’02). Google Scholar
  35. Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: Efficient boosting procedures for multiclass object detection. In IEEE conference on computer vision and pattern recognition (CVPR’04). Google Scholar
  36. Veksler, O. (2003). Fast variable window for stereo correspondence using integral images. In IEEE conference on computer vision and pattern recognition (CVPR’03) (pp. 556–564). Google Scholar
  37. Vestri, C., & Devernay, F. (2001). Using robust methods for automatic extraction of buildings. In IEEE conference on computer vision and pattern recognition (CVPR’01). Google Scholar
  38. Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154. CrossRefGoogle Scholar
  39. Vosselman, G., & Dijkman, S. (2001). 3D building model reconstruction from point clouds and ground plans (34-3/W4:22–24). Google Scholar
  40. Wolf, M. (1999). Photogrammetric data capture and calculation for 3D city models. In Photogrammetric week’99 (pp. 305–312). Google Scholar
  41. Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In Tenth international conference on computer vision (ICCV’05). Google Scholar
  42. Yang, R., & Pollefeys, M. (2003). Multi-resolution real-time stereo on commodity graphics hardware. In IEEE conference on computer vision and pattern recognition (CVPR’03). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Nico Cornelis
    • 1
  • Bastian Leibe
    • 2
  • Kurt Cornelis
    • 1
  • Luc Van Gool
    • 1
    • 2
  1. 1.KU LeuvenLeuvenBelgium
  2. 2.ETH ZurichZurichSwitzerland

Personalised recommendations