Skip to main content
Log in

Detailed Real-Time Urban 3D Reconstruction from Video

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The paper presents a system for automatic, geo-registered, real-time 3D reconstruction from video of urban scenes. The system collects video streams, as well as GPS and inertia measurements in order to place the reconstructed models in geo-registered coordinates. It is designed using current state of the art real-time modules for all processing steps. It employs commodity graphics hardware and standard CPU’s to achieve real-time performance. We present the main considerations in designing the system and the steps of the processing pipeline. Our system extends existing algorithms to meet the robustness and variability necessary to operate out of the lab. To account for the large dynamic range of outdoor videos the processing pipeline estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in stereo estimation without impacting the real-time performance. The required accuracy for many applications is achieved with a two-step stereo reconstruction process exploiting the redundancy across frames. We show results on real video sequences comprising hundreds of thousands of frames.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akbarzadeh, A., Frahm, J.-M., Mordohai, P., Clipp, B., Engels, C., Gallup, D., et al. (2006). Towards urban 3D reconstruction from video. In Proceedings of international symposium on 3D data, processing, visualization and transmission.

  • American Society of Photogrammetry. (2004). Manual of photogrammetry (5th ed.). Asprs Pubns.

  • Azarbayejani, A., & Pentland, A. P. (1995). Recursive estimation of motion, structure, and focal length. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(6), 562–575.

    Article  Google Scholar 

  • Baker, S., Gross, R., Matthews, I., & Ishikawa, T. (2003). Lucas–Kanade 20 years on: a unifying framework: part 2 (Technical Report CMU-RI-TR-03-01). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, February 2003.

  • Beardsley, P., Zisserman, A., & Murray, D. (1997). Sequential updating of projective and affine structure from motion. International Journal of Computer Vision, 23(3), 235–259.

    Article  Google Scholar 

  • Biber, P., Fleck, S., Staneker, D., Wand, M., & Strasser, W. (2005). First experiences with a mobile platform for flexible 3d model acquisition in indoor and outdoor environments—the waggle. In ISPRS working group V/4: 3D-ARCH.

  • Birchfield, S., & Tomasi, C. (1999). Multiway cut for stereo and motion with slanted surfaces. In International conference on computer vision (pp. 489–495).

  • Bosse, M., Rikoski, R., Leonard, J., & Teller, S. (2003). Vanishing points and 3d lines from omnidirectional video. The Visual Computer, 19(6), 417–430.

    Article  Google Scholar 

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.

    Article  Google Scholar 

  • Brown, R. G., & Hwang, P. Y. C. (1997). Introduction to random signals and applied Kalman filtering (3rd ed.). New York: Wiley.

    MATH  Google Scholar 

  • Burt, P., Wixson, L., & Salgian, G. (1995). Electronically directed “focal” stereo. In International conference on computer vision (pp. 94–101).

  • Collins, R. T. (1996). A space-sweep approach to true multi-image matching. In International conference on computer vision and pattern recognition (pp. 358–363).

  • Cornelis, N., Cornelis, K., & Van Gool, L. (2006). Fast compact city modeling for navigation pre-visualization. In International conference on computer vision and pattern recognition.

  • Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In SIGGRAPH (Vol. 30, pp. 303–312).

  • El-Hakim, S. F., Beraldin, J.-A., Picard, M., & Vettore, A. (2003). Effective 3d modeling of heritage sites. In 4th international conference of 3D imaging and modeling (pp. 302–309).

  • Faugeras, O. D. (1993). Three-dimensional computer vision: a geometric viewpoint. Cambridge: MIT Press.

    Google Scholar 

  • Faugeras, O., Luong, Q.-T., & Maybank, S. (1992). Camera self-calibration: theory and experiments. In European conference on computer vision (pp. 321–334). Berlin: Springer.

    Google Scholar 

  • Fischer, A., Kolbe, T. H., Lang, F., Cremers, A. B., Förstner, W., Plümer, L., & Steinhage, V. (1998). Extracting buildings from aerial images using hierarchical aggregation in 2D and 3D. Computer Vision and Image Understanding, 72(2), 185–203.

    Article  Google Scholar 

  • Fitzgibbon, A., & Zisserman, A. (1998). Automatic camera recovery for closed or open image sequences. In European conference on computer vision (pp. 311–326).

  • Früh, C., & Zakhor, A. (2004). An automated method for large-scale, ground-based city model acquisition. International Journal of Computer Vision, 60(1), 5–24.

    Article  Google Scholar 

  • Fua, P. V. (1997). From multiple stereo views to multiple 3-D surfaces. International Journal of Computer Vision, 24(1), 19–35.

    Article  Google Scholar 

  • Gallup, D., Frahm, J.-M., Mordohai, P., Yang, Q., & Pollefeys, M. (2007). Real-time plane-sweeping stereo with multiple sweeping directions. In International conference on computer vision and pattern recognition.

  • Garland, M., & Heckbert, P. S. (1997). Surface simplification using quadric error metrics. In SIGGRAPH ’97 (pp. 209–216).

  • Goesele, M., Curless, B., & Seitz, S. M. (2006). Multi-view stereo revisited. Computer Vision and Pattern Recognition, 2, 2402–2409.

    Google Scholar 

  • Grewal, M. S., & Andrews, A. P. (2001). Kalman filtering theory and practice using MATLAB (2nd ed.). New York: Wiley.

    Google Scholar 

  • Gruen, A., & Wang, X. (1998). Cc-modeler: a topology generator for 3-D city models. ISPRS Journal of Photogrammetry & Remote Sensing, 53(5), 286–295.

    Article  Google Scholar 

  • Hartley, R. I., & Sturm, P. (1997). Triangulation. Computer Vision and Image Understanding, 68(2), 146–157.

    Article  Google Scholar 

  • Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Hilton, A., Stoddart, A. J., Illingworth, J., & Windeatt, T. (1996). Reliable surface reconstruction from multiple range images. In European conference on computer vision (pp. 117–126).

  • Hoiem, D., Efros, A. A., & Hebert, M. (2006). Putting objects in perspective. In International conference on computer vision and pattern recognition (pp. 2137–2144).

  • Jin, H., Favaro, P., & Soatto, S. (2001). Real-time feature tracking and outlier rejection with changes in illumination. In International conference on computer vision (pp. 684–689).

  • Kang, S. B., Szeliski, R., & Chai, J. (2001). Handling occlusions in dense multi-view stereo. In International conference on computer vision and pattern recognition (pp. 103–110).

  • Kim, S. J., Gallup, D., Frahm, J.-M., Akbarzadeh, A., Yang, Q., Yang, R., Nistér, D., & Pollefeys, M. (2007). Gain adaptive real-time stereo streaming. In International conference on vision systems.

  • Koch, R., Pollefeys, M., & Van Gool, L. J. (1998). Multi viewpoint stereo from uncalibrated video sequences. In European conference on computer vision (Vol. I, pp. 55–71).

  • Koch, R., Pollefeys, M., & Van Gool, L. (1999). Robust calibration and 3D geometric modeling from large collections of uncalibrated images. In DAGM (pp. 413–420).

  • Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International joint conference on artificial intelligence (pp. 674–679).

  • Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J.-M., Nister, D., & Pollefeys, M. (2007). Real-time visibility-based fusion of depth maps. In Proceedings of international conference on computer vision.

  • Morency, L. P., Rahimi, A., & Darrell, T. J. (2002). Fast 3D model acquisition from stereo images. In 3D data processing, visualization and transmission (pp. 172–176).

  • Morris, D. D., & Kanade, T. (2000). Image-consistent surface triangulation. In International conference on computer vision and pattern recognition (Vol. I, pp. 332–338).

  • Narayanan, P. J., Rander, P. W., & Kanade, T. (1998). Constructing virtual worlds using dense stereo. In International conference on computer vision (pp. 3–10).

  • Nistér, D. (2003). Preemptive RANSAC for live structure and motion estimation. In International conference on computer vision (Vol. 1, pp. 199–206).

  • Nistér, D. (2004). An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756–777.

    Article  Google Scholar 

  • Nistér, D., Naroditsky, O., & Bergen, J. (2006). Visual odometry for ground vehicle applications. Journal of Field Robotics, 23(1), 3–20.

    Article  Google Scholar 

  • Ogale, A. S., & Aloimonos, Y. (2004). Stereo correspondence with slanted surfaces: critical implications of horizontal slant. In International conference on computer vision and pattern recognition (pp. 568–573).

  • Pajarola, R. (2002) Overview of quadtree-based terrain triangulation and visualization (Technical Report UCI-ICS-02-01). Information & Computer Science, University of California Irvine.

  • Pajarola, R., Meng, Y., & Sainz, M. (2002). Fast depth-image meshing and warping (Technical Report UCI-ECE-02-02). Information & Computer Science, University of California Irvine.

  • Pollefeys, M., Koch, R., & Van Gool, L. (1999). Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters. International Journal of Computer Vision, 32(1), 7–25.

    Article  Google Scholar 

  • Román, A., Garg, G., & Levoy, M. (2004). Interactive design of multi-perspective images for visualizing urban landscapes. In IEEE visualization (pp. 537–544).

  • Rusinkiewicz, S., Hall-Holt, O., & Levoy, M. (2002). Real-time 3D model acquisition. ACM Transactions on Graphics, 21(3), 438–446.

    Article  Google Scholar 

  • Sato, T., Kanbara, M., Yokoya, N., & Takemura, H. (2002). Dense 3-D reconstruction of an outdoor scene by hundreds-baseline stereo using a hand-held video camera. International Journal of Computer Vision, 47(1-3), 119–129.

    Article  MATH  Google Scholar 

  • Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1-3), 7–42.

    Article  MATH  Google Scholar 

  • Schindler, G., & Dellaert, F. (2004). Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments. In International conference on computer vision and pattern recognition (pp. 203–209).

  • Schindler, G., Krishnamurthy, P., & Dellaert, F. (2006). Line-based structure from motion for urban environments. In 3DPVT.

  • Schindler, G., Dellaert, F., & Kang, S. B. (2007). Inferring temporal order of images from 3D structure. In International conference on computer vision and pattern recognition.

  • Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In International conference on computer vision and pattern recognition (pp. 519–528).

  • Shi, J., & Tomasi, C. (1994). Good features to track. In International conference on computer vision and pattern recognition (pp. 593–600).

  • Sinha, S., Frahm, J.-M., Pollefeys, M., & Genc, Y. (2007). Feature tracking and matching in video using programmable graphics hardware. Machine Vision and Applications.

  • Soatto, S., Perona, P., Frezza, R., & Picci, G. (1993). Recursive motion and structure estimation with complete error characterization. In International conference on computer vision and pattern recognition (pp. 428–433).

  • Soucy, M., & Laurendeau, D. (1995). A general surface approach to the integration of a set of range views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(4), 344–358.

    Article  Google Scholar 

  • Stamos, I., & Allen, P. K. (2002). Geometry and texture recovery of scenes of large scale. Computer Vision and Image Understanding, 88(2), 94–118.

    Article  MATH  Google Scholar 

  • Stewénius, H., Nistér, D., Oskarsson, M., & Åström, K. (2005). Solutions to minimal generalized relative pose problems. In Workshop on omnidirectional vision, Beijing, China, October 2005.

  • Szeliski, R., & Scharstein, D. (2004). Sampling the disparity space image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3), 419–425.

    Article  Google Scholar 

  • Teller, S., Antone, M., Bodnar, Z., Bosse, M., Coorg, S., Jethwa, M., & Master, N. (2003). Calibrated, registered images of an extended urban area. International Journal of Computer Vision, 53(1), 93–107.

    Article  Google Scholar 

  • Turk, G., & Levoy, M. (1994). Zippered polygon meshes from range images. In SIGGRAPH (pp. 311–318).

  • Werner, T., & Zisserman, A. (2002). New techniques for automated architectural reconstruction from photographs. In European conference on computer vision (pp. 541–555).

  • Wheeler, M. D., Sato, Y., & Ikeuchi, K. (1998). Consensus surfaces for modeling 3D objects from multiple range images. In International conference on computer vision (pp. 917–924).

  • Yang, R., & Pollefeys, M. (2003). Multi-resolution real-time stereo on commodity graphics hardware. In International conference on computer vision and pattern recognition (pp. 211–217).

  • Zabulis, X., & Daniilidis, K. (2004). Multi-camera reconstruction based on surface normal estimation and best viewpoint selection. In 3DPVT.

  • Zhu, Z., Hanson, A. R., & Riseman, E. M. (2004). Generalized parallel-perspective stereo mosaics from airborne video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 226–237.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Mordohai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pollefeys, M., Nistér, D., Frahm, JM. et al. Detailed Real-Time Urban 3D Reconstruction from Video. Int J Comput Vis 78, 143–167 (2008). https://doi.org/10.1007/s11263-007-0086-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-007-0086-4

Keywords

Navigation