International Journal of Computer Vision

, Volume 37, Issue 3, pp 231–258

Structure from Motion: Beyond the Epipolar Constraint


  • Tomáš Brodský
    • Computer Vision Laboratory, Center for Automation ResearchUniversity of Maryland
  • Cornelia Fermüller
    • Computer Vision Laboratory, Center for Automation ResearchUniversity of Maryland
  • Yiannis Aloimonos
    • Computer Vision Laboratory, Center for Automation ResearchUniversity of Maryland

DOI: 10.1023/A:1008132107950

Cite this article as:
Brodský, T., Fermüller, C. & Aloimonos, Y. International Journal of Computer Vision (2000) 37: 231. doi:10.1023/A:1008132107950


The classic approach to structure from motion entails a clear separation between motion estimation and structure estimation and between two-dimensional (2D) and three-dimensional (3D) information. For the recovery of the rigid transformation between different views only 2D image measurements are used. To have available enough information, most existing techniques are based on the intermediate computation of optical flow which, however, poses a problem at the locations of depth discontinuities. If we knew where depth discontinuities were, we could (using a multitude of approaches based on smoothness constraints) accurately estimate flow values for image patches corresponding to smooth scene patches; but to know the discontinuities requires solving the structure from motion problem first. This paper introduces a novel approach to structure from motion which addresses the processes of smoothing, 3D motion and structure estimation in a synergistic manner. It provides an algorithm for estimating the transformation between two views obtained by either a calibrated or uncalibrated camera. The results of the estimation are then utilized to perform a reconstruction of the scene from a short sequence of images.

The technique is based on constraints on image derivatives which involve the 3D motion and shape of the scene, leading to a geometric and statistical estimation problem. The interaction between 3D motion and shape allows us to estimate the 3D motion while at the same time segmenting the scene. If we use a wrong 3D motion estimate to compute depth, we obtain a distorted version of the depth function. The distortion, however, is such that the worse the motion estimate, the more likely we are to obtain depth estimates that vary locally more than the correct ones. Since local variability of depth is due either to the existence of a discontinuity or to a wrong 3D motion estimate, being able to differentiate between these two cases provides the correct motion, which yields the “least varying” estimated depth as well as the image locations of scene discontinuities. We analyze the new constraints, show their relationship to the minimization of the epipolar constraint, and present experimental results using real image sequences that indicate the robustness of the method.

3D motion estimationscene reconstructionsmoothing and discontinuity detectiondepth variability constraint

Copyright information

© Kluwer Academic Publishers 2000