Advertisement

Detailed Real-Time Urban 3D Reconstruction from Video

  • M. Pollefeys
  • D. Nistér
  • J.-M. Frahm
  • A. Akbarzadeh
  • P. Mordohai
  • B. Clipp
  • C. Engels
  • D. Gallup
  • S.-J. Kim
  • P. Merrell
  • C. Salmi
  • S. Sinha
  • B. Talton
  • L. Wang
  • Q. Yang
  • H. Stewénius
  • R. Yang
  • G. Welch
  • H. Towles
Article

Abstract

The paper presents a system for automatic, geo-registered, real-time 3D reconstruction from video of urban scenes. The system collects video streams, as well as GPS and inertia measurements in order to place the reconstructed models in geo-registered coordinates. It is designed using current state of the art real-time modules for all processing steps. It employs commodity graphics hardware and standard CPU’s to achieve real-time performance. We present the main considerations in designing the system and the steps of the processing pipeline. Our system extends existing algorithms to meet the robustness and variability necessary to operate out of the lab. To account for the large dynamic range of outdoor videos the processing pipeline estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in stereo estimation without impacting the real-time performance. The required accuracy for many applications is achieved with a two-step stereo reconstruction process exploiting the redundancy across frames. We show results on real video sequences comprising hundreds of thousands of frames.

Keywords

3D reconstruction Stereo vision Structure from motion Large scale modeling Urban reconstruction Plane sweeping Depth map fusion 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akbarzadeh, A., Frahm, J.-M., Mordohai, P., Clipp, B., Engels, C., Gallup, D., et al. (2006). Towards urban 3D reconstruction from video. In Proceedings of international symposium on 3D data, processing, visualization and transmission. Google Scholar
  2. American Society of Photogrammetry. (2004). Manual of photogrammetry (5th ed.). Asprs Pubns. Google Scholar
  3. Azarbayejani, A., & Pentland, A. P. (1995). Recursive estimation of motion, structure, and focal length. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(6), 562–575. CrossRefGoogle Scholar
  4. Baker, S., Gross, R., Matthews, I., & Ishikawa, T. (2003). Lucas–Kanade 20 years on: a unifying framework: part 2 (Technical Report CMU-RI-TR-03-01). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, February 2003. Google Scholar
  5. Beardsley, P., Zisserman, A., & Murray, D. (1997). Sequential updating of projective and affine structure from motion. International Journal of Computer Vision, 23(3), 235–259. CrossRefGoogle Scholar
  6. Biber, P., Fleck, S., Staneker, D., Wand, M., & Strasser, W. (2005). First experiences with a mobile platform for flexible 3d model acquisition in indoor and outdoor environments—the waggle. In ISPRS working group V/4: 3D-ARCH. Google Scholar
  7. Birchfield, S., & Tomasi, C. (1999). Multiway cut for stereo and motion with slanted surfaces. In International conference on computer vision (pp. 489–495). Google Scholar
  8. Bosse, M., Rikoski, R., Leonard, J., & Teller, S. (2003). Vanishing points and 3d lines from omnidirectional video. The Visual Computer, 19(6), 417–430. CrossRefGoogle Scholar
  9. Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239. CrossRefGoogle Scholar
  10. Brown, R. G., & Hwang, P. Y. C. (1997). Introduction to random signals and applied Kalman filtering (3rd ed.). New York: Wiley. zbMATHGoogle Scholar
  11. Burt, P., Wixson, L., & Salgian, G. (1995). Electronically directed “focal” stereo. In International conference on computer vision (pp. 94–101). Google Scholar
  12. Collins, R. T. (1996). A space-sweep approach to true multi-image matching. In International conference on computer vision and pattern recognition (pp. 358–363). Google Scholar
  13. Cornelis, N., Cornelis, K., & Van Gool, L. (2006). Fast compact city modeling for navigation pre-visualization. In International conference on computer vision and pattern recognition. Google Scholar
  14. Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In SIGGRAPH (Vol. 30, pp. 303–312). Google Scholar
  15. El-Hakim, S. F., Beraldin, J.-A., Picard, M., & Vettore, A. (2003). Effective 3d modeling of heritage sites. In 4th international conference of 3D imaging and modeling (pp. 302–309). Google Scholar
  16. Faugeras, O. D. (1993). Three-dimensional computer vision: a geometric viewpoint. Cambridge: MIT Press. Google Scholar
  17. Faugeras, O., Luong, Q.-T., & Maybank, S. (1992). Camera self-calibration: theory and experiments. In European conference on computer vision (pp. 321–334). Berlin: Springer. Google Scholar
  18. Fischer, A., Kolbe, T. H., Lang, F., Cremers, A. B., Förstner, W., Plümer, L., & Steinhage, V. (1998). Extracting buildings from aerial images using hierarchical aggregation in 2D and 3D. Computer Vision and Image Understanding, 72(2), 185–203. CrossRefGoogle Scholar
  19. Fitzgibbon, A., & Zisserman, A. (1998). Automatic camera recovery for closed or open image sequences. In European conference on computer vision (pp. 311–326). Google Scholar
  20. Früh, C., & Zakhor, A. (2004). An automated method for large-scale, ground-based city model acquisition. International Journal of Computer Vision, 60(1), 5–24. CrossRefGoogle Scholar
  21. Fua, P. V. (1997). From multiple stereo views to multiple 3-D surfaces. International Journal of Computer Vision, 24(1), 19–35. CrossRefGoogle Scholar
  22. Gallup, D., Frahm, J.-M., Mordohai, P., Yang, Q., & Pollefeys, M. (2007). Real-time plane-sweeping stereo with multiple sweeping directions. In International conference on computer vision and pattern recognition. Google Scholar
  23. Garland, M., & Heckbert, P. S. (1997). Surface simplification using quadric error metrics. In SIGGRAPH ’97 (pp. 209–216). Google Scholar
  24. Goesele, M., Curless, B., & Seitz, S. M. (2006). Multi-view stereo revisited. Computer Vision and Pattern Recognition, 2, 2402–2409. Google Scholar
  25. Grewal, M. S., & Andrews, A. P. (2001). Kalman filtering theory and practice using MATLAB (2nd ed.). New York: Wiley. Google Scholar
  26. Gruen, A., & Wang, X. (1998). Cc-modeler: a topology generator for 3-D city models. ISPRS Journal of Photogrammetry & Remote Sensing, 53(5), 286–295. CrossRefGoogle Scholar
  27. Hartley, R. I., & Sturm, P. (1997). Triangulation. Computer Vision and Image Understanding, 68(2), 146–157. CrossRefGoogle Scholar
  28. Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge: Cambridge University Press. zbMATHGoogle Scholar
  29. Hilton, A., Stoddart, A. J., Illingworth, J., & Windeatt, T. (1996). Reliable surface reconstruction from multiple range images. In European conference on computer vision (pp. 117–126). Google Scholar
  30. Hoiem, D., Efros, A. A., & Hebert, M. (2006). Putting objects in perspective. In International conference on computer vision and pattern recognition (pp. 2137–2144). Google Scholar
  31. Jin, H., Favaro, P., & Soatto, S. (2001). Real-time feature tracking and outlier rejection with changes in illumination. In International conference on computer vision (pp. 684–689). Google Scholar
  32. Kang, S. B., Szeliski, R., & Chai, J. (2001). Handling occlusions in dense multi-view stereo. In International conference on computer vision and pattern recognition (pp. 103–110). Google Scholar
  33. Kim, S. J., Gallup, D., Frahm, J.-M., Akbarzadeh, A., Yang, Q., Yang, R., Nistér, D., & Pollefeys, M. (2007). Gain adaptive real-time stereo streaming. In International conference on vision systems. Google Scholar
  34. Koch, R., Pollefeys, M., & Van Gool, L. J. (1998). Multi viewpoint stereo from uncalibrated video sequences. In European conference on computer vision (Vol. I, pp. 55–71). Google Scholar
  35. Koch, R., Pollefeys, M., & Van Gool, L. (1999). Robust calibration and 3D geometric modeling from large collections of uncalibrated images. In DAGM (pp. 413–420). Google Scholar
  36. Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International joint conference on artificial intelligence (pp. 674–679). Google Scholar
  37. Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J.-M., Nister, D., & Pollefeys, M. (2007). Real-time visibility-based fusion of depth maps. In Proceedings of international conference on computer vision. Google Scholar
  38. Morency, L. P., Rahimi, A., & Darrell, T. J. (2002). Fast 3D model acquisition from stereo images. In 3D data processing, visualization and transmission (pp. 172–176). Google Scholar
  39. Morris, D. D., & Kanade, T. (2000). Image-consistent surface triangulation. In International conference on computer vision and pattern recognition (Vol. I, pp. 332–338). Google Scholar
  40. Narayanan, P. J., Rander, P. W., & Kanade, T. (1998). Constructing virtual worlds using dense stereo. In International conference on computer vision (pp. 3–10). Google Scholar
  41. Nistér, D. (2003). Preemptive RANSAC for live structure and motion estimation. In International conference on computer vision (Vol. 1, pp. 199–206). Google Scholar
  42. Nistér, D. (2004). An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756–777. CrossRefGoogle Scholar
  43. Nistér, D., Naroditsky, O., & Bergen, J. (2006). Visual odometry for ground vehicle applications. Journal of Field Robotics, 23(1), 3–20. CrossRefGoogle Scholar
  44. Ogale, A. S., & Aloimonos, Y. (2004). Stereo correspondence with slanted surfaces: critical implications of horizontal slant. In International conference on computer vision and pattern recognition (pp. 568–573). Google Scholar
  45. Pajarola, R. (2002) Overview of quadtree-based terrain triangulation and visualization (Technical Report UCI-ICS-02-01). Information & Computer Science, University of California Irvine. Google Scholar
  46. Pajarola, R., Meng, Y., & Sainz, M. (2002). Fast depth-image meshing and warping (Technical Report UCI-ECE-02-02). Information & Computer Science, University of California Irvine. Google Scholar
  47. Pollefeys, M., Koch, R., & Van Gool, L. (1999). Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters. International Journal of Computer Vision, 32(1), 7–25. CrossRefGoogle Scholar
  48. Román, A., Garg, G., & Levoy, M. (2004). Interactive design of multi-perspective images for visualizing urban landscapes. In IEEE visualization (pp. 537–544). Google Scholar
  49. Rusinkiewicz, S., Hall-Holt, O., & Levoy, M. (2002). Real-time 3D model acquisition. ACM Transactions on Graphics, 21(3), 438–446. CrossRefGoogle Scholar
  50. Sato, T., Kanbara, M., Yokoya, N., & Takemura, H. (2002). Dense 3-D reconstruction of an outdoor scene by hundreds-baseline stereo using a hand-held video camera. International Journal of Computer Vision, 47(1-3), 119–129. zbMATHCrossRefGoogle Scholar
  51. Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1-3), 7–42. zbMATHCrossRefGoogle Scholar
  52. Schindler, G., & Dellaert, F. (2004). Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments. In International conference on computer vision and pattern recognition (pp. 203–209). Google Scholar
  53. Schindler, G., Krishnamurthy, P., & Dellaert, F. (2006). Line-based structure from motion for urban environments. In 3DPVT. Google Scholar
  54. Schindler, G., Dellaert, F., & Kang, S. B. (2007). Inferring temporal order of images from 3D structure. In International conference on computer vision and pattern recognition. Google Scholar
  55. Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In International conference on computer vision and pattern recognition (pp. 519–528). Google Scholar
  56. Shi, J., & Tomasi, C. (1994). Good features to track. In International conference on computer vision and pattern recognition (pp. 593–600). Google Scholar
  57. Sinha, S., Frahm, J.-M., Pollefeys, M., & Genc, Y. (2007). Feature tracking and matching in video using programmable graphics hardware. Machine Vision and Applications. Google Scholar
  58. Soatto, S., Perona, P., Frezza, R., & Picci, G. (1993). Recursive motion and structure estimation with complete error characterization. In International conference on computer vision and pattern recognition (pp. 428–433). Google Scholar
  59. Soucy, M., & Laurendeau, D. (1995). A general surface approach to the integration of a set of range views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(4), 344–358. CrossRefGoogle Scholar
  60. Stamos, I., & Allen, P. K. (2002). Geometry and texture recovery of scenes of large scale. Computer Vision and Image Understanding, 88(2), 94–118. zbMATHCrossRefGoogle Scholar
  61. Stewénius, H., Nistér, D., Oskarsson, M., & Åström, K. (2005). Solutions to minimal generalized relative pose problems. In Workshop on omnidirectional vision, Beijing, China, October 2005. Google Scholar
  62. Szeliski, R., & Scharstein, D. (2004). Sampling the disparity space image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3), 419–425. CrossRefGoogle Scholar
  63. Teller, S., Antone, M., Bodnar, Z., Bosse, M., Coorg, S., Jethwa, M., & Master, N. (2003). Calibrated, registered images of an extended urban area. International Journal of Computer Vision, 53(1), 93–107. CrossRefGoogle Scholar
  64. Turk, G., & Levoy, M. (1994). Zippered polygon meshes from range images. In SIGGRAPH (pp. 311–318). Google Scholar
  65. Werner, T., & Zisserman, A. (2002). New techniques for automated architectural reconstruction from photographs. In European conference on computer vision (pp. 541–555). Google Scholar
  66. Wheeler, M. D., Sato, Y., & Ikeuchi, K. (1998). Consensus surfaces for modeling 3D objects from multiple range images. In International conference on computer vision (pp. 917–924). Google Scholar
  67. Yang, R., & Pollefeys, M. (2003). Multi-resolution real-time stereo on commodity graphics hardware. In International conference on computer vision and pattern recognition (pp. 211–217). Google Scholar
  68. Zabulis, X., & Daniilidis, K. (2004). Multi-camera reconstruction based on surface normal estimation and best viewpoint selection. In 3DPVT. Google Scholar
  69. Zhu, Z., Hanson, A. R., & Riseman, E. M. (2004). Generalized parallel-perspective stereo mosaics from airborne video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 226–237. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • M. Pollefeys
    • 1
  • D. Nistér
    • 2
  • J.-M. Frahm
    • 1
  • A. Akbarzadeh
    • 2
  • P. Mordohai
    • 1
  • B. Clipp
    • 1
  • C. Engels
    • 2
  • D. Gallup
    • 1
  • S.-J. Kim
    • 1
  • P. Merrell
    • 1
  • C. Salmi
    • 1
  • S. Sinha
    • 1
  • B. Talton
    • 1
  • L. Wang
    • 2
  • Q. Yang
    • 2
  • H. Stewénius
    • 2
  • R. Yang
    • 2
  • G. Welch
    • 1
  • H. Towles
    • 1
  1. 1.Department of Computer ScienceUniversity of North CarolinaChapel HillUSA
  2. 2.Center for Visualization and Virtual EnvironmentsUniversity of KentuckyLexingtonUSA

Personalised recommendations