Abstract
The problem of structure from motion is often decomposed into two steps: feature correspondence and three-dimensional reconstruction. This separation often causes gross errors when establishing correspondence fails. Therefore, we advocate the necessity to integrate visual information not only in time (i.e. across different views), but also in space, by matching regions – rather than points – using explicit photometric deformation models. We present an algorithm that integrates image-feature tracking and three-dimensional motion estimation into a closed loop, while detecting and rejecting outlier regions that do not fit the model. Due to occlusions and the causal nature of our algorithm, a drift in the estimates accumulates over time. We describe a method to perform global registration of local estimates of motion and structure by matching the appearance of feature regions stored over long time periods. We use image intensities to construct a score function that takes into account changes in brightness and contrast. Our algorithm is recursive and suitable for real-time implementation.
Similar content being viewed by others
References
Adiv G (1998) Determining 3-d motion and structure from optical flow generated by several moving objects. IEEE Trans Pattern Anal Mach Intell 7(4):384–401
Alon J, Sclaroff S (2000) Recursive estimation of motion and planar structure. IEEE Comput Vision Pattern Recogn II:550–556
Azarbayejani A, Pentland AP (1995) Recursive estimation of motion, structure, and focal length. IEEE Trans Pattern Anal Mach Intell 17(6):562–575
Bartlett MS (1956) An introduction to stochastic processes. Cambridge University Press
Broida TJ, Chellappa R (1986) Estimation of object motion parameters from noisy images. IEEE Trans Pattern Anal Mach Intell 8(1):90–99
Chiuso A, Brockett R, Soatto S (2000) Optimal structure from motion: local ambiguities and global estimates. Int J Comput Vision 39(3):195–228
Dellaert F, Seitz S, Thorpe C, Thrun S (2000) Structure from motion without correspondence. Proc IEEE Comput Vision Pattern Recogn II:557–564
Dickmanns ED, Graefe V (1988) Applications of dynamic monocular machine vision. Mach Vision Appl 1:241–261
Hanna KJ (1991) Direct multi-resolution estimation of ego-motion and structure from motion. Workshop on Visual Motion, pp 156–162
Jin H, Favaro P, Soatto S (2000) Real-time 3-d motion and structure of point features: front-end system for vision-based control and interaction. Proc IEEE Comput Vision Pattern Recogn II:778–779
Jin H, Favaro P, Soatto S (2001) Real-time feature tracking and outlier rejection with changes in illumination. In: Proceedings International Conference on Computer Vision I, pp 684–689
Matthies LH, Szeliski R, Kanade T (1989) Kalman filter-based algorithms for estimating depth from image sequences. Int J Comput Vision 3(3):209–238
McLauchlan PF (1999) Gauge invariance in projective 3d reconstruction. Workshop on Multi-View Modeling and Analysis of Visual Scenes, pp 37–44
McLauchlan PF (2000) A batch/recursive algorithm for 3d scene reconstruction. IEEE Comput Vision Pattern Recogn II:738–743
Oliensis J (2000) A new structure-from-motion ambiguity. IEEE Trans Pattern Anal Mach Intell 22(7):685–700
Philip J (1991) Estimation of three-dimensional motion of rigid objects from noisy observations. IEEE Trans Pattern Anal Mach Intell 13(1):61–66
Poelman CJ, Kanade T (1997) A paraperspective factorization for shape and motion recovery. IEEE Trans Pattern Anal Mach Intell 19(3):206–218
Rahimi A, Morency LP, Darrell T (2001) Reducing drift in parametric motion tracking. In Proceedings International Conference on Computer Vision I, pp 315–322
Sawhney HS (1994) Simplifying motion and structure analysis using planar parallax and image warping. International Conference on Pattern Recognition A, pp 403–408
Soatto S (1994) Observability/identifiability of rigid motion under perspective projection. CDC, pp 3235–3240
Soatto S (1997) 3-d structure from visual motion: modeling, representation and observability. Automatica 33:1287–1312
Soatto S, Perona P (1998) Reducing structure-from-motion: a general framework for dynamic vision part 1: modeling. IEEE Trans Pattern Anal Mach Intell 20(9):933–942
Spetsakis M, Aloimonos JY (1988) Optimal computing of structure from motion using point correspondences in two frames. International Conference on Computer Vision, pp 449–453
Sturm PF (2000) Algorithms for plane-based pose estimation. IEEE Comput Vision Pattern Recogn I:706–711
Szeliski R, Kang SB (1995) Direct methods for visual scene reconstruction. IEEE Workshop on Representation of Visual Scenes, pp 26–33
Thomas JI, Oliensis J (1992) Recursive multi-frame structure from motion incorporating motion error. Image Understanding Workshop, pp 507–513
Tomasi C, Kanade T (1992) Shape and motion from image streams under orthography: a factorization method. Int J Comput Vision 9(2):137–154
Weng J, Ahuja N, Huang TS (1991) Motion and structure from point correspondences with error estimation: planar surfaces. IEEE Trans Signal Process 39(12):2691–2717
Weng J, Ahuja N, Huang TS (1993) Optimal motion and structure estimation. IEEE Trans Pattern Anal Mach Intell 15(9):864–884
Xu G, Terai JI, Shum HY (2000) A linear algorithm for camera self-calibration, motion and structure recovery for multi-planar scenes from two perspective images. IEEE Comput Vision Pattern Recogn II:474–479
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jin , H., Favaro , P. & Soatto , S. A semi-direct approach to structure from motion. Vis Comput 19, 377–394 (2003). https://doi.org/10.1007/s00371-003-0202-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-003-0202-6