Robust video mosaicing through topology inference and local to global alignment

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1407)


The problem of piecing together individual frames in a video sequence to create seamless panoramas (video mosaics) has attracted increasing attention in recent times. One challenge in this domain has been to rapidly and automatically create high quality seamless mosaics using inexpensive cameras and relatively free hand motions.

In order to capture a wide angle scene using a video sequence of relatively narrow angle views, the scene needs to be scanned in a 2D pattern. This is like painting a canvas on a 2D manifold with the video frames using multiple connected 1D brush strokes. An important issue that needs to be addressed in this context is that of aligning frames that have been captured using a 2D scanning of the scene rather than a 1D scan as is commonly done in many existing mosaicing systems.

In this paper we present an end-to-end solution to the problem of video mosaicing when the transformations between frames may be modeled as parametric. We provide solutions to two key problems: (i) automatic inference of topology of the video frames on a 2D manifold, and (ii) globally consistent estimation of alignment parameters that map each frame to a consistent mosaic coordinate system. Our method iterates among automatic topology determination, local alignment, and globally consistent parameter estimation to produce a coherent mosaic from a video sequence, regardless of the camera's scan path over the scene. While this framework is developed independent of the specific alignment model, we illustrate the approach by constructing planar and spherical mosaics from real videos.


Video Sequence Lens Distortion Alignment Parameter Neighboring Frame Local Registration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [1]
    J. R. Bergen et al. Hierarchical model-based motion estimation. In Proc. 2nd European Conference on Computer Vision, pages 237–252, 1992.Google Scholar
  2. [2]
    R. I. Hartley. Self-calibration from multiple views with a rotating camera. In ECCV, pages 471–478, 1994.Google Scholar
  3. [3]
    Apple Computer Inc. An overview of apple's QuickTime VR technology, 1995. Scholar
  4. [4]
    M. Irani, P. Anandan, and S. Hsu. Mosaic based representations of video sequences and their applications. In Proc. Intl. Conf. on Computer Vision, pages 605–611, 1995.Google Scholar
  5. [5]
    S. B. Kang and R. Weiss. Characterization of errors in compositing panoramic images. In Proc. Computer Vision and Pattern Recognition Conference, pages 103–109, 1997.Google Scholar
  6. [6]
    S. Mann and R. W. Picard. Virtual bellows: Constructing high quality stills from video. In ICIP, 1994.Google Scholar
  7. [7]
    L. McMillan and G. Bishop. Plenoptic modeling: An image-based rendering system. In Proc. of SIGGRAPH, pages 39–46, 1995.Google Scholar
  8. [8]
    S. Peleg and J. Herman. Panoramic mosaics by manifold projection. In CVPR, pages 338–343, 1997.Google Scholar
  9. [9]
    H. S. Sawhney, S. Ayer, and M. Gorkani. Model-based 2D&3D dominant motion estimation for mosaicing and video representation. In Proc. Intl. Conf. on Computer Vision, pages 583–590, 1995. Scholar
  10. [10]
    H. S. Sawhney and R. Kumar. True multi-image alignment and its application to mosaicing and lens distortion. In CVPR, pages 450–456, 1997.Google Scholar
  11. [11]
    C. C. Slama. Manual of Photogrammetry. Amer. Soc. of Photogrammetry, Falls Church, VA, 1980.Google Scholar
  12. [12]
    R. Szeliski. Image mosaicing for tele-reality applications. In IEEE Wkshp. on Applications of Computer Vision, pages 44–53, 1994.Google Scholar
  13. [13]
    R. Szeliski and H. Shum. Creating full view panoramic image mosaics and environment maps. In Proc. of SIGGRAPH, pages 251–258, 1997.Google Scholar
  14. [14]
    L. A. Teodosio and W. Bender. Salient video stills: Content and context preserved. In ACM Intl. Conf. on Multimedia, 1993.Google Scholar
  15. [15]
    VideoBrush. Scholar
  16. [16]
    Y. Xiong and K. Turkowski. Creating image-based VR using a self-calibrating fisheye lens. In Proc. Computer Vision and Pattern Recognition Conference, pages 237–243, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  1. 1.Vision Technologies LaboratorySarnoff CorporationPrincetonUSA

Personalised recommendations