Machine Learning

, Volume 50, Issue 1–2, pp 45–71 | Cite as

EM, MCMC, and Chain Flipping for Structure from Motion with Unknown Correspondence

  • Frank Dellaert
  • Steven M. Seitz
  • Charles E. Thorpe
  • Sebastian Thrun


Learning spatial models from sensor data raises the challenging data association problem of relating model parameters to individual measurements. This paper proposes an EM-based algorithm, which solves the model learning and the data association problem in parallel. The algorithm is developed in the context of the the structure from motion problem, which is the problem of estimating a 3D scene model from a collection of image data. To accommodate the spatial constraints in this domain, we compute virtual measurements as sufficient statistics to be used in the M-step. We develop an efficient Markov chain Monte Carlo sampling method called chain flipping, to calculate these statistics in the E-step. Experimental results show that we can solve hard data association problems when learning models of 3D scenes, and that we can do so efficiently. We conjecture that this approach can be applied to a broad range of model learning problems from sensordata, such as the robot mapping problem.

expectation-maximization Markov chain Monte Carlo data association structure from motion correspondence problem efficient sampling computer vision 


  1. Avitzour, D. (1992). A maximum likelihood approach to data association. IEEE Trans. on Aerospace and Electronic Systems, 28:2, 560-566.Google Scholar
  2. Ayer, S., &; Sawhney, H. S. (1995). Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding. In Int. Conf. on Computer Vision (ICCV) (pp. 777-784).Google Scholar
  3. Bar-Shalom, Y., &; Fortmann, T. E. (1988). Tracking and data association. New York: Academic Press.Google Scholar
  4. Basri, R., Grove, A. J., &; Jacobs, D.W. (1998). Efficient determination of shape from multiple images containing partial information. Pattern Recognition, 31:11, 1691-1703.Google Scholar
  5. Beardsley, P. A., Torr, P. H. S., &; Zisserman, A. (1996). 3D model acquisition from extended image sequences. In Eur. Conf. on Computer Vision (ECCV) (pp. II:683-695).Google Scholar
  6. Bertsekas, D. P. (1991). Linear network optimization: Algorithms and codes. Cambridge, MA: The MIT Press.Google Scholar
  7. Borenstein, J., Everett, B., &; Feng, L. (1996). Navigating mobile robots: Systems and techniques. Wellesley, MA: A. K. Peters, Ltd.Google Scholar
  8. Broder, A. Z. (1986). How hard is to marry at random? (On the approximation of the permanent). In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing (pp. 50-58). Berkeley, California.Google Scholar
  9. Broida, T., &; Chellappa, R. (1991). Estimating the kinematics and structure of a rigid object from a sequence of monocular images. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13:6, 497-513.Google Scholar
  10. Burgard, W., Fox, D., Jans, H., Matenar, C., &; Thrun, S. (1999). Sonar-based mapping of large-scale mobile robot environments using EM. In Proceedings of the International Conference on Machine Learning, Bled, Slovenia.Google Scholar
  11. Castellanos, J. A., Montiel, J. M. M., Neira, J., &; Tardos, J. D. (1999). The SPmap: A probabilistic framework for simultaneous localization and map building. IEEE Trans. on Robotics and Automation, 15:5, 948-953.Google Scholar
  12. Castellanos, J. A., &; Tardos, J. D. (2000). Mobile robot localization and map building: A multisensor fusion approach. Boston, MA: Kluwer Academic Publishers.Google Scholar
  13. Cook, W. J., Cunningham, W. H., Pulleyblank, W. R., &; Schrijver, A. (1998). Combinatorial optimization. New York, NY: John Wiley &; Sons.Google Scholar
  14. Cox, I. J. (1993). A review of statistical data association techniques for motion correspondence. Int. J. of Computer Vision, 10:1, 53-66.Google Scholar
  15. Cox, I. J., &; Hingorani, S. L. (1994). An efficient implementation and evaluation of Reid's multiple hypothesis tracking algorithm for visual tracking. In Int. Conf. on Pattern Recognition (ICPR), (Vol. 1, pp. 437-442). Jerusalem, Israel.Google Scholar
  16. Cox, I. J., &; Leonard, J. J. (1994). Modeling a dynamic environment using a Bayesian multiple hypothesis approach. Artificial Intelligence, 66:2, 311-344.Google Scholar
  17. Dellaert, F. (2001). Monte Carlo EM for data association and its applications in computer vision. PhD thesis, School of Computer Science, Carnegie Mellon. Also available as Technical Report CMU-CS-01-153.Google Scholar
  18. Dempster, A. P., Laird, N. M., &; Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1, 1-38.Google Scholar
  19. Deriche, R., &; Faugeras, O. D. (1990). Tracking line segments. Image and Vision Computing, 8, 261-270.Google Scholar
  20. Doucet, A., de Freitas, N., &; Gordon, N. (Eds.). (2001). Sequential Monte Carlo methods in practice. New York: Springer-Verlag.Google Scholar
  21. Feder, H. J. S., Leonard, J. J., &; Smith, C. M. (1999). Adaptive mobile robot navigation and mapping. International Journal of Robotics Research, Special Issue on Field and Service Robotics, 18:7, 650-668.Google Scholar
  22. Gauvrit, H., Le Cadre, J. P., &; Jauffret, C. (1997). A formulation of multitarget tracking as an incomplete data problem. IEEE Trans. on Aerospace and Electronic Systems, 33:4, 1242-1257.Google Scholar
  23. Gilks, W. R., Richardson, S., &; Spiegelhalter, D. J. (Eds.) (1996). Markov chain Monte Carlo in practice. Chapman and Hall.Google Scholar
  24. Gold, S., Rangarajan, A., Lu, C., Pappu, S., &; Mjolsness, E. (1998). Newalgorithms for 2D and 3D point matching. Pattern Recognition, 31:8, 1019-1031.Google Scholar
  25. Gutmann, J.-S., &; Konolige, K. (2000). Incremental mapping of large cyclic environments. In Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA).Google Scholar
  26. Hartley, H. O. (1958). Maximum likelihood estimation from incomplete data. Biometrics, 14, 174-194.Google Scholar
  27. Hartley, R. I. (1994). Euclidean reconstruction from uncalibrated views. In Application of Invariance in Computer Vision, 237-256.Google Scholar
  28. Hartley, R., &; Zisserman, A. (2000). Multiple viewgeometry in computer vision. Cambridge: Cambridge University Press.Google Scholar
  29. Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika, 57, 97-109.Google Scholar
  30. Jacobs, D.W. (1997). Linear fitting with missing data: Applications to structure from motion and to characterizing intensity images. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (pp. 206-212).Google Scholar
  31. Kozen, D. C. (1991). The design and analysis of algorithms. Berlin: Springer-Verlag.Google Scholar
  32. Kurata, T., Fujiki, J., Kourogi, M., &; Sakaue, K. (1999). A robust recursive factorization method for recovering structure and motion from live video frames. In 1999 ICCV Workshop on Frame Rate Processing, Corfu, Greece.Google Scholar
  33. Leonard, J. J., Cox, I. J., &; Durrant-Whyte, H. F. (1992). Dynamic mmap building for an autonomous mobile robot. Int. J. Robotics Research, 11:4, 286-289.Google Scholar
  34. Leonard, J. J. &; Durrant-Whyte, H. F. (1992). Directed sonar sensing for mobile robot navigation. Boston: Kluwer Academic.Google Scholar
  35. Longuet-Higgins, H. C. (1981). A computer algorithm for reconstructing a scene from two projections. Nature, 293, 133-135.Google Scholar
  36. McLachlan, G. J., &; Basford, K. E. (1988). Mixture models: Inference and applications to clustering. New York: Marcel Dekker.Google Scholar
  37. McLachlan, G. J., &; Krishnan, T. (1997). The EM algorithm and extensions. Wiley series in probability and statistics. John Wiley &; Sons.Google Scholar
  38. McLauchlan, P., &; Murray, D. (1995). A unifying framework for structure and motion recovery from image sequences. In Int. Conf. on Computer Vision (ICCV) (pp. 314-320).Google Scholar
  39. Morris, D. D., &; Kanade, T. (1998). A unified factorization algorithm for points, line segments and planes with uncertainty models. In Int. Conf. on Computer Vision (ICCV) (pp. 696-702).Google Scholar
  40. Morris, D. D., Kanatani, K., &; Kanade, T. (1999). Uncertainty modeling for optimal structure from motion. In ICCV Workshop on Vision Algorithms: Theory and Practice.Google Scholar
  41. Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRGTR-93-1, Dept. of Computer Science, University of Toronto.Google Scholar
  42. Papadimitriou, C. H., &; Steiglitz, K. (1982). Combinatorial optimization: Algorithms and complexity. Englewood Cliff, NJ: Prentice-Hall.Google Scholar
  43. Pasula, H., Russell, S., Ostland, M., &; Ritov, Y. (1999). Tracking many objects with many sensors. In Int. Joint Conf. on Artificial Intelligence (IJCAI), Stockholm.Google Scholar
  44. Poelman, C., &; Kanade, T. (1997). A paraperspective factorization method for shape and motion recovery. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19:3, 206-218.Google Scholar
  45. Popoli, R., &; Blackman, S. S. (1999). Design and analysis of modern tracking systems. Artech House Radar Library.Google Scholar
  46. Rasmussen, C., &; Hager, G. D. (1998). Joint probabilistic techniques for tracking objects using multiple vision clues. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS) (pp. 191-196).Google Scholar
  47. Reid, D. B. (1979). An algorithm for tracking multiple targets. IEEE Trans. on Automation and Control, AC-24:6, 84-90.Google Scholar
  48. Scott, G. L., &; Longuet-Higgins, H. C. (1991). Analgorithm for associating the features of two images. Proceedings of Royal Society of London, B-244, 21-26.Google Scholar
  49. Seitz, S. M., &; Dyer, C. R. (1995). Complete structure from four point correspondences. In Int. Conf. on Computer Vision (ICCV) (pp. 330-337).Google Scholar
  50. Shapiro, L. S., &; Brady, J. M. (1992). Feature-based correspondence: An eigenvector approach. Image and Vision Computing, 10:5, 283-288.Google Scholar
  51. Shatkay, H. (1998). Learning models for robot navigation. Ph.D. thesis, Computer Science Department, Brown University, Providence, RI.Google Scholar
  52. Shatkay, H., &; Kaelbling, L. (1997). Learning topological maps with weak local odometric information. In Proceedings of IJCAI-97, IJCAI, Inc.Google Scholar
  53. Smith, A. F. M., &; Gelfand, A. E. (1992). Bayesian statistics without tears: A sampling-resampling perspective. American Statistician, 46:2, 84-88.Google Scholar
  54. Spetsakis, M., &; Aloimonos, Y. (1991). A multi-frame approach to visual motion perception. Int. J. of Computer Vision, 6:3, 245-255.Google Scholar
  55. Streit, R., &; Luginbuhl, T. (1994). Maximum likelihood method for probabilistic multi-hypothesis tracking. In Proc. SPIE (Vol. 2335, pp. 394-405).Google Scholar
  56. Szeliski, R., &; Kang, S. B. (1993). Recovering 3D shape and motion from image streams using non-linear least squares. Technical Report CRL 93/3, DEC Cambridge Research Lab.Google Scholar
  57. Tanner, M. A. (1996). Tools for statistical inference (3rd edn). New York: Springer Verlag.Google Scholar
  58. Thrun, S., Fox, D., &; Burgard, W. (1998a). A probabilistic approach to concurrent mapping and localization for mobile robots. Machine Learning, 31, 29-53. also appeared in Autonomous Robots, 5, 253-271.Google Scholar
  59. Thrun, S., Fox, D., &; Burgard, W. (1998). Probabilistic mapping of an environment by a mobile robot. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  60. Tomasi, C., &; Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method. Int. J. of Computer Vision, 9:2, 137-154.Google Scholar
  61. Torr, P.H. S. (1997). An assessment of information criteria for motion model selection. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (pp. 47-53).Google Scholar
  62. Torr, P., Fitzgibbon, A., &; Zisserman, A. (1998). Maintaining multiple motion model hypotheses over many views to recover matching and structure. In Int. Conf. on Computer Vision (ICCV) (pp. 485-491).Google Scholar
  63. Torr, P. H. S., Szeliski, R., &; Anandan, P. (1999). An integrated bayesian approach to layer extraction from image sequences. In Int. Conf. on Computer Vision (ICCV) (pp. 983-990).Google Scholar
  64. Triggs, B. (1996). Factorization methods for projective structure and motion. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (pp. 845-851).Google Scholar
  65. Triggs, B., McLauchlan, P., Hartley, R., &; Fitzgibbon, A. (1999). Bundle adjustment-a modern synthesis. In Vision Algorithms 99, Corfu, Greece.Google Scholar
  66. Tsai, R. Y., &; Huang, T. S. (1984). Uniqueness and estimation of three-dimensional motion parameters of rigid objects with curved surfaces. IEEE Trans. on Pattern Analysis and Machine Intelligence, 6:1, 13-27.Google Scholar
  67. Ullman, S. (1979). The interpretation of visual motion. Cambridge, MA: MIT Press.Google Scholar
  68. Zhang, Z., &; Faugeras, O. D. (1992). Three-dimensional motion computation and object segmentation in a long sequence of stereo frames. Int. J. of Computer Vision, 7:3, 211-241.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Frank Dellaert
    • 1
  • Steven M. Seitz
    • 2
  • Charles E. Thorpe
    • 3
  • Sebastian Thrun
    • 3
  1. 1.College of ComputingGeorgia Institute of TechnologyAtlantaUSA
  2. 2.Department of Computer Science and EngineeringUniversity of WashingtonSeattleUSA
  3. 3.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations