Abstract
The need for error modeling, multisensor fusion, and robust algorithms is becoming increasingly recognized in computer vision. Bayesian modeling is a powerful, practical, and general framework for meeting these requirements. This article develops a Bayesian model for describing and manipulating the dense fields, such as depth maps, associated with low-level computer vision. Our model consists of three components: a prior model, a sensor model, and a posterior model. The prior model captures a priori information about the structure of the field. We construct this model using the smoothness constraints from regularization to define a Markov Random Field. The sensor model describes the behavior and noise characteristics of our measurement system. We develop a number of sensor models for both sparse and dense measurements. The posterior model combines the information from the prior and sensor models using Bayes' rule. We show how to compute optimal estimates from the posterior model and also how to compute the uncertainty (variance) in these estimates. To demonstrate the utility of our Bayesian framework, we present three examples of its application to real vision problems. The first application is the on-line extraction of depth from motion. Using a two-dimensional generalization of the Kalman filter, we develop an incremental algorithm that provides a dense on-line estimate of depth whose accuracy improves over time. In the second application, we use a Bayesian model to determine observer motion from sparse depth (range) measurements. In the third application, we use the Bayesian interpretation of regularization to choose the optimal smoothing parameter for interpolation. The uncertainty modeling techniques that we develop, and the utility of these techniques in various applications, support our claim that Bayesian modeling is a powerful and practical framework for low-level vision.
Similar content being viewed by others
References
Ackley, D.H., Hinton, G.E., and Sejnowski, T.J. 1985. A learning algorithm for Boltzmann machines. Cognitive Science, 9: 147–169.
Adelson, E.H., and Bergen, J.R. 1985. Spatioternporal energy models for the perception of motion. J. Opt. Soc. Amer. A2: 284–299.
Aloimonos, J., Weiss, I., and Bandyopadhyay, A. 1987. Active vision. In Proc. 1st Intern. Conf. Comput. Vision, London, pp. 35–54.
Anandan, P. 1989. A computational framework and an algorithm for the measurement of visual motion. Intern. J. Comput. Vision 2: 283–310.
Anandan, P., and Weiss, R. 1985. Introducing a smoothness constraint in a matching approach for the computation of displacement fields. In Proc. DARPA Image Understanding Workshop, Miami Reach, FL, pp. 186–196.
Baker, H.H. and Bolles, R.C., 1989. Generalizing epipolar-plane image analysis on the spatiotemporal surface, Intern. J. Comput. Vision 3: 33–49.
Barnard, S.T. 1989. Stochastic stereo matching over scale, Intern. J. Comput. Vision 3: 17–32.
Barnard, S.T., and Fischler, M.A. 1982. Computational stereo. Computing Surveys 14: 553–572.
Barrow, H.G., and Tenenbaum, J.M. 1978. Recovering intrinsic scene characteristics from images. In Allen R.Hanson and Edward M.Riseman (eds.) Computer Vision Systems, pp. 3–26. Academic Press: New York.
Bertero, M., Pogglo, T., and Torre, V. 1987. III-posed problems in early vision. A.I. Memo 924, Massachusetts Institute of Technology.
Bierman, G.J. 1977. Factorization Methods for Discrete Sequential Estimation. Academic Press: New York.
Blake, A., and Zisserman, A. 1987. Visual Reconstruction MIT Press: Cambridge, MA.
Bolles, R.C., Baker, H.H., and Marimont, D.H. 1987. Epipolar-plane image analysis: An approach to determining structure from motion. Intern. J. Comput. Vision 1: 7–55.
Boult, T.E. 1986. Information based complexity in non-linear equations and computer vision, Ph.D. thesis, Columbia University.
Bracewell, R.N. 1978. The Fourier Transform and Its Applications. 2nd ed. McGraw-Hill: New York.
Briggs, W.L. 1987. A Multigrid Tutorial. Society for Industrial and Applied Mathematics: Philadelphia.
Burt, P.J., and Adelson, E.H. 1983. The Laplacian pyramid as a compact image code. IEEE. Trans. Commun. COM-31: 532–540.
Canny, J. 1986. A computational approach to edge detection. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-8: 679–698.
Chen, L.-H., and Boult, T.E. 1988. An integrated approach to stereo matching, surface reconstruction and depth segmentation using consistent smoothness assumptions. In Proc. DARPA Image Understanding Workshop, Cambridge, MA, pp. 166–176.
Choi, D.J. 1987. Solving the depth interpolation problem on a fine grained, mesh-and tree-connected SIMD machine. In Proc. DARPA Image Understanding Workshop, Los Angeles, pp. 639–643.
Christ, J.P. 1987. Shape estimation and object recognition using spatial probability distributions. Ph.D. thesis, Carnegie Mellon University.
Craven, P., and Wahba, G. 1979. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik 31: 377–403.
Crowley, J.L., and Stern, R.M. 1982. Fast computation of the difference of low-pass transform. Tech. Rept. CMU-RI-TR-82-18: The Robotics Institute, Carnegic Mellon University
Dev, P. 1974. Segmentation processes in visual perception: a cooperative neural model. COINS Technical Report 74C-5, University of Massachusetts at Amherst.
Duda, R.O., and Hart, P.E. 1973. Pattern Classification and Scene Analysis, Wiley: New York.
Durbin, R., Szeliski, R., and Yuille, A. 1989. An analysis of the elastic net approach to the travelling salesman problem. Neural Computation 1: 348–358.
Durbin, R., and Willshaw, D. 1987. An analogue approach to the traveling salesman problem using an elastic net method. Nature 326: 689–691.
Durrant-Whyte, H.F. 1987. Consistent integration and propagation of disparate sensor observations. Intern. J. Robotics Res. 6: 3–24.
Elfes, A., and Matthies, L. 1987. Sensor integration for robot navigation: Combining sonar and stereo range data in a grid-based representation. In Proc. IEEE Conf. Decision and Control.
Faugeras, O.D., Ayache, N., and Faverjon B. 1986. Building visual maps by combining noisy stereo measurements. In Proc. IEEE Intern. Conf. Robotics and Automation, San Francisco, pp. 1433–1438.
Faugeras, O.D., and Hebert, M. 1987. The representation, recognition and positioning of 3-D shapes from range data. In TakeoKanade (ed.), Three-Dimensional Machine Vision. Kluwer Academic Publishers: Boston, pp. 301–353.
Gamble, E., and Poggio, T. 1987. Visual integration and detection of discontinuities: The key role of intensity edges. A.I. Memo 970, Artif. Intell. Lab., Massachusetts Institute of Technology.
Geiger, D., and Girosi, F. 1989. Mean field theory for surface reconstruction. In Proc. Image Understanding Workshop, Palo Alto, CA, pp. 617–630.
Gelb, Arthur (ed.). 1974, Applied Optimal Estimation, MIT Press: Cambridge, MA.
Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. In IEEE Trans. Patt. Anal. Mach. Intell. PAMI-6: 721–741.
Geman, S., and McClure, D.E. 1987. Statistical methods for tomographic image reconstruction. In Proc. 46th Session of the Intern. Statistical Inst.
Grimson, W.E.L. 1981. From Images to Surfaces: a Computional Study of the Human Early Visual System. MIT Press: Cambridge, MA.
Grimson, W.E.L. 1983. An implementation of a computational theory of visual surface interpolation. Comput. Vision, Graphics, and Image Process. 22: 39–69.
Hackbusch, W. 1985. Multigrid Methods and Applications. Springer-Verlag: Berlin.
Heeger, D.J. 1987. Optical flow from spatiotemporal filters. In Proc. Ist Intern. Conf. Comput. Vision, London, pp. 181–190.
Heel, J. 1989. Dynamic motion vision. In Proc. Image Understanding Workshop, Palo Alto, CA, pp. 702–713.
Hinton, G.E. 1977. Relaxation and its role in vision. Ph.D. thesis, University of Edinburgh.
Hinton, G.E., Sejnowski, T.J. 1983. Optimal perceptual inference In Proc. Conf. Comput. Vision and Patt. Recog., Washington, D.C., pp. 448–453.
Hoff, W., and Ahuja, N. 1986. Surfaces from stereo. In Proc. 8th Intern. Conf. Patt. Recog., Paris, pp. 516–518.
Horn, B.K.P. 1977. Understanding image intensities. Artificial Intelligence 8: 201–231.
Horn, B.K.P., and Brooks, M.J. 1986. The variational approach to shape from shading. Comput. Vision, Graphics, Image Process. 33: 174–208.
Horn, B.K.P., and Schunck, B.-G., 1981. Determining optical flow, Artificial Intelligence 17: 185–203.
Hueckel, M.H. 1971. An operator which locates edges in digitized pictures. J. Assoc. Comput. Mach. 18: 113–125.
Ikeuchi, K., and Horn, B.K.P. 1981. Numerical shape from shading and occluding boundaries. Artificial Intelligence 17: 141–184.
Julesz, B. 1971. Foundations of Cyclopean Perception. Chicago University Press: Chicago.
Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: Active contour models. Intern. J. Comput. Vision 1: 321–331.
Kimeldorf, G., and Wahba, G. 1970. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Stat. 41: 495–502.
Kirkpatrick, S., Gelatt, C.D.Jr., and Vecchi, M.P. 1983. Optimization by simulated annealing. Science 220: 671–680.
Koch, C., Marroquin, J., and Yuille, A. 1986. Analog “neuronal” networks in early vision. Proc. Nat. Acad. Sci. U.S.A. 83: 4263–4267.
Konrad, J., and Dubois, E. 1988. Multigrid Bayesian estimation of image motion fields using stochastic relaxation. In Proc. 2nd Intern. Conf. Comput. Vision, Tampa, FL, pp. 354–362.
Leelere, Y.G., 1989. Constructing simple stable descriptions for image partitioning. Intern. J. Comput. Vision 3: 75–102.
Lowe, D.G. 1985. Perceptual Organization and Visual Recognition. Kluwer Academic Publishers: Boston.
Mandelbrot, B.B. 1982. The Fractal Geometry of Nature. W.H. Frecman: San Francisco.
Marr, D. 1978. Representing visual information. In Allen R.Hanson and Edward M.Riseman (eds.), Computer Vision Systems, pp. 61–80, Academic Press: New York.
Marr, D. 1982, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman: San Francisco.
Marr, D., and Hildreth, E. 1980, Theory of edge detection. Proc. Roy. Soc. London B 207: 187–217.
Marr, D., and Poggio, T. 1976. Cooperative computation of stereo disparity. Science 194: 283–287.
Marroquin, J.L. 1984, Surface reconstruction preserving discontinuities. A.I. Memo 792, Artificial Intelligence Laboratory, Massachusetts Institute of Technology.
Marroquin, J.L. 1985. Probabilistic Solution of Inverse Problems. Ph.D. thesis, Massachusetts of Technology.
Matthies, L., and Shafer, S.A. 1987. Error inodeling in stereo navigation. IEEE J. Robotics Automation RA-3: 239–248.
Matthies, L.H., Kanade, T., and Szeliski, R. 1989. Kalman filter-based algorithms for estimating depth from image sequences. Intern. J. Comput. Vision 3: 209–236.
McDermott, D. 1980. Spatial inferences with ground, metric formulas on simple objects. Department of Computer Science, Yale University, Res. Rept. 173.
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., and Teller, E. 1953. Equations of state calculations by fast computing machines. J. Chem. Physics 21: 1087–1091.
Moravec, H.P. 1988. Sensor fusion in certainty grids for mobile robots. Al Magazine 9: 61–74.
Pentland, A.P. 1986. Perceptual organization and the representation of natural form. Artificial Intelligence 28: 293–331.
Poggio, T., Torre, V., and Koch, C. 1985. Computational vision and regularization theory. Nature 317: 314–319.
Poggio, T., Voorhees, H., and Yuille, A. 1985. A regularized solution to edge detection. A. I. Memo 833. Artificial Intelligence Laboratory, Massachusetts Institute of Technology.
Poggio, T., et al. 1988. The MIT vision machine. In Proc. DARPA Image Understanding Workshop, Boston, pp. 177–198.
Rensink, R.A. 1986. On the Visual Discrimination of Self-Similar Random Textures. Master's thesis, The University of British Columbia.
Rives, P., Breuil, E., and Espiau, B. 1986. Recursive estimation of 3D features using optical flow and camera motion. In Proc. Conf. Intell. Autonomous Systems. pp. 522–532. (Also in 1987 Proc. IEEE Intern. Conf. Robotics and Automation.)
Roberts, L.G. 1965. Machine perception of three dimensional solids. In Tippett et al., (eds.), Optical and Electro-Optical Information Processing, ch. 9, pp. 159–197, MIT Press: Cambridge, MA.
Rosenfeld, A. 1980. Quadtrees and pyramids for pattern recognition and image processing. In 5th Intern. Conf. Patt. Recog., Miami Beach, FL, pp. 802–809.
Rosenfeld, A. (ed.). 1984, Multiresolution Image Processing and Analysis, Springer-Verlag: New York.
Rosenfeld, A., Hummel, R.A., and Zucker, S.W. 1976. Scene labeling by relaxation operations, IEEE Trans. Syst., Man, and Cybern. SMC-6: 420–433.
Szeliski, R. 1986. Cooperative algorithms for solving random-dot stereograms. Tech. Rept. CMU-CS-86-133, Computer Science Department, Carnegie Mellon University.
Szeliski, R. 1987. Regularization uses fractal priors. In Proc. 6th Nat. Conf. Artif. Intell., Seattle, pp. 749–754.
Szeliski, R. 1988. Estimating motion from sparse range data without correspondence. In Proc. 2nd Intern. Conf. Comput. Vision, Tampa, FL, pp. 207–216.
Szeliski, R. 1989. Bayesian Modeling of Uncertainty in Low-Level Vision. Kluwer Academic Publishers: Boston.
Szeliski, R. 1990a. Fast shape from shading. In Proc. 1st European Conf. Comput. Vision, Antibes, Frane, pp. 359–368.
Szeliski, R. 1990b. Fast surface interpolation using hierarchical basis functions. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-12: 513–528.
Szeliski, R., and Terzopoulos, D. 1989a. From splines to fractals. Computer Graphies 23: 51–60.
Szeliski, R., and Terzopoulos, D. 1988b. Parallel multigrid algorithms and computer vision applications. In 4th Copper Mountain Conf. on Multigrid Methods, Copper Mountain, Colorado, pp. 383–398.
Terzopoulos, D. 1983. Multilevel computational processes for visual surface reconstruction. Comput. Vision, Graphics, Image Process. 24: 52–96.
Terzopoulos, D. 1986a. Image analysis using multigrid relaxation methods. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-8: 129–139.
Terzopoulos, D. 1986b. Regularization of inverse visual problems involving discontinuities. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-8: 413–424.
Terzopoulos, D. 1987. Matching deformable models to images: Direct and iterative solutions. In Topical Meeting on Machine Vision, Washington, D.C., pp. 164–167.
Terzopoulos, D. 1988. The computation of visible-surface representations. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-10: 417–438.
Terzopoulos, D., Witkin, A., and Kass, M. 1987. Symmetry-seeking models and 3D object reconstruction. Intern. J. Comput. Vision 1: 211–221.
Tikhonov, A.N., and Arsenin, V.Y. 1977. Solutions of Ill-Posed Problems, V.H. Winston: Washington, D.C.
Tsai, R.Y., and Huang, T.S. 1984. Uniqueness and estimation of threedimensional motion parameters of rigid objects with curved surfaces. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-6: 13–27.
Ullman, S. 1979. The Interpretation of Visual Motion. MIT Press: Cambridge, MA.
VanEssen, D.C., and Maunsell, J.H.R. 1983. Hierarchical organization and functional streams in the visual cortex. Trends in Neuroscience 6: 370–375.
Voss, R.F., 1985. Random fractal forgeries. In R.A.Earnshaw (ed.), Fundamental Algorithms for Computer Graphics, Springer-Verlag, Berlin.
Wahba, G. 1983. Bayesian “confidence intervals” for the crossvalidated smoothing spline. J. Roy. Statist. Soc. B 45: 133–150.
Waltz, D.L. 1975. Understanding line drawing of scenes with shadows. In P.Winston, (ed.), The Psychology of Computer Vision, McGraw-Hill, New York.
Witkin, A., Terzopoulos, D., and Kass, M. 1987, Signal matching through scale space. Intern. J. Comput. Vision 1: 133–144.
Yserentant, H. 1986. On the multi-level splitting of finite element spaces. Numerische Mathematik 49: 379–412.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Szeliski, R. Bayesian modeling of uncertainty in low-level vision. Int J Comput Vision 5, 271–301 (1990). https://doi.org/10.1007/BF00126502
Issue Date:
DOI: https://doi.org/10.1007/BF00126502