International Journal of Computer Vision

, Volume 91, Issue 2, pp 157–174 | Cite as

Omnidirectional Image Stabilization for Visual Object Recognition



In this paper, we present a pipeline for camera pose and trajectory estimation, and image stabilization and rectification for dense as well as wide baseline omnidirectional images. The proposed pipeline transforms a set of images taken by a single hand-held camera to a set of stabilized and rectified images augmented by the computed camera 3D trajectory and a reconstruction of feature points facilitating visual object recognition. The paper generalizes previous works on camera trajectory estimation done on perspective images to omnidirectional images and introduces a new technique for omnidirectional image rectification that is suited for recognizing people and cars in images. The performance of the pipeline is demonstrated on real image sequences acquired in urban as well as natural environments.


Omnidirectional vision Structure from motion Image rectification Object recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11263_2010_350_MOESM1_ESM.mpg (4.6 mb)
(MPG 4.56 MB)

(MPG 4.58 MB)


  1. 2d3. Boujou (2001).
  2. Akbarzadeh, A., Frahm, J. M., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Merrell, P., Phelps, M., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewénius, H., Yang, R., Welch, G., Towles, H., Nistér, D., & Polleeys, M. (2006). Towards urban 3D reconstruction from video. In 3DPVT, Invited paper. Google Scholar
  3. Bakstein, H., & Pajdla, T. (2002). Panoramic mosaicing with a 180° field of view lens. In OMNIVIS ’02, Copenhagen, Denmark (pp. 60–67). Google Scholar
  4. Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359. CrossRefGoogle Scholar
  5. Brown, M., & Lowe, D. G. (2003). Recognising panoramas. In ICCV ’03, Washington, DC, USA. Google Scholar
  6. Chum, O., & Matas, J. (2005). Matching with PROSAC—progressive sample consensus. In CVPR ’05, Los Alamitos, USA (Vol. I, pp. 220–226). Google Scholar
  7. Clipp, B. Kim, J.-H., Frahm, J.-M., Pollefeys, M., Hartley, R. (2008). Robust 6DOF motion estimation for non-overlapping, multi-camera systems. In WACV ’08 (Vol. I, pp. 1–8). Google Scholar
  8. Cornelis, N., Cornelis, K., & Van Gool, L. (2006). Fast compact city modeling for navigation pre-visualization. In CVPR ’06, New York, USA (Vol. II, pp. 1339–1344). Google Scholar
  9. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR ’05, Los Alamitos, USA (Vol. I, pp. 886–893). Google Scholar
  10. Davison, A. J., & Molton, N. D. (2007). Monoslam: Real-time single camera SLAM. IEEE Transactions on Patern Analysis and Machine Intelligence, 29(6), 1052–1067. CrossRefGoogle Scholar
  11. Ess, A., Leibe, B., Schindler, K., & Van Gool, L. (2008). A mobile vision system for robust multi-person tracking. In CVPR ’08, Anchorage, AK, USA. Google Scholar
  12. Fischler, M., & Bolles, R. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395. CrossRefMathSciNetGoogle Scholar
  13. Geyer, C., & Daniilidis, K. (2001). Structure and motion from uncalibrated catadioptric views. In CVPR ’01 (pp. 279–286). Google Scholar
  14. Goedemé, T., Nuttin, M., Tuytelaars, T., & Van Gool, L. (2007). Omnidirectional vision based topological navigation. International Journal of Computer Vision, 74(3), 219–236. CrossRefGoogle Scholar
  15. Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press. Google Scholar
  16. Havlena, M., Pajdla, T., & Cornelis, K. (2008). Structure from omnidirectional stereo rig motion for city modeling. In VISAPP ’08, Funchal, Portugal. Google Scholar
  17. Havlena, M., Torii, A., Knopp, H., & Pajdla, T. (2009). Randomized structure from motion based on atomic 3D models from camera triplets. In CVPR ’09, Miami, FL, USA. Google Scholar
  18. Heller, J., Havlena, M., Torii, A., & Pajdla, T. (2010). CMP SfM web service v1.0. (Research Report CTU–CMP–2010–01). CMP Prague. Google Scholar
  19. Hoiem, D., Efros, A. A., & Hebert, M. (2006). Putting objects in perspective. In CVPR ’06 (Vol. II, pp. 2137–2144). Google Scholar
  20. Kahl, F. (2005). Multiple view geometry and the L-infinity norm. In ICCV ’05, China, Beijing. Google Scholar
  21. Ke, Q., & Kanade, T. (2007). Quasiconvex optimization for robust geometric reconstruction. IEEE Transactions on Patern Analysis and Machine Intelligence, 29(10), 1834–1847. CrossRefGoogle Scholar
  22. Knopp, J., Šivic, J., & Pajdla, T. (2009). Location recognition using large vocabularies and fast spatial matching (Research Report CTU–CMP–2009–01). CMP Prague. Google Scholar
  23. Leibe, B., Cornelis, N., Cornelis, K., & Van Gool, L. (2007a). Dynamic 3D scene analysis from a moving vehicle. In CVPR ’07, Minneapolis, MN, USA. Google Scholar
  24. Leibe, B., Schindler, K., & Van Gool, L. (2007b). Coupled detection and trajectory estimation for multi-object tracking. In ICCV ’07, Rio de Janeiro, Brazil. Google Scholar
  25. Li, H., & Hartley, R. (2005). A non-iterative method for correcting lens distortion from nine point correspondences. In OMNIVIS ’05 China: Beijing. Google Scholar
  26. Lourakis, M., & Argyros, A. (2004). The design and implementation of a generic sparse bundle adjustment software package based on the Levenberg-Marquardt algorithm (Technical Report 340). Institute of Computer Science—FORTH, Heraklion, Crete, Greece.
  27. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  28. Martinec, D., & Pajdla, T. (2007). Robust rotation and translation estimation in multiview reconstruction. In CVPR ’07, Minneapolis, MN, USA. Google Scholar
  29. Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767. CrossRefGoogle Scholar
  30. Microsoft (2008). Photosynth: Use your camera to stitch the world.
  31. Mičušík, B., & Pajdla, T. (2006). Structure from motion with wide circular field of view cameras. IEEE Transactions on Patern Analysis and Machine Intelligence, 28(7), 1135–1149. CrossRefGoogle Scholar
  32. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Van Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72. CrossRefGoogle Scholar
  33. Muja, M., & Lowe, D. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP ’09, Lisboa, Portugal. Google Scholar
  34. Nistér, D. (2004a). An efficient solution to the five-point relative pose problem. IEEE Transactions on Patern Analysis and Machine Intelligence, 26(6), 756–770. CrossRefGoogle Scholar
  35. Nistér, D. (2004b). A minimal solution to the generalized 3-point pose problem. In CVPR ’04, Washington, DC, USA (Vol. I, pp. 560–567). Google Scholar
  36. Nistér, D., & Engels, C. (2006). Estimating global uncertainty in epipolar geometry for vehicle-mounted cameras. In SPIE, unmanned systems technology VIII (Vol. 6230). Google Scholar
  37. Obdržálek, Š., & Matas, J. (2002). Object recognition using local affine frames on distinguished regions. In BMVC ’02, London, UK (Vol. I, pp. 113–122). Google Scholar
  38. Obdržálek, Š, & Matas, J. (2003). Image retrieval using local compact DCT-based representation. In LNCS : Vol. 2781. DAGM ’03 (pp. 490–497). Berlin: Springer. Google Scholar
  39. Point Grey Research (2005). Ladybug 2 Spherical Digital Camera System.
  40. Scaramuzza, D., Fraundorfer, F., Siegwart, R., & Pollefeys, M. (2008). Closing the loop in appearance guided SfM for omnidirectional cameras. In OMNIVIS ’08, Marseille, France. Google Scholar
  41. Schweighofer, G., & Pinz, A. (2008). Globally optimal O(n) solution to the PnP problem for general camera models. In BMVC ’08, Leeds, UK. Google Scholar
  42. Sivic, J., & Zisserman, A. (2006). Video Google: Efficient visual search of videos. In CLOR ’06 (pp. 127–144). Google Scholar
  43. Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo Tourism: Exploring image collections in 3D. In SigGraph ’06, Boston, USA (pp. 835–846). Google Scholar
  44. Snavely, N., Seitz, S., & Szeliski, R. (2008). Skeletal graphs for efficient structure from motion. In CVPR ’08, Anchorage, AK, USA. Google Scholar
  45. Stewénius, H. (2005). Gröbner basis methods for minimal problems in computer vision. PhD thesis, Centre for Mathematical Sciences LTH, Lund University, Sweden. Google Scholar
  46. Sturm, J. (2006). Sedumi: A software package to solve optimization problems.
  47. Tardif, J., Pavlidis, Y., & Daniilidis, K. (2008). Monocular visual odometry in urban environments using an omdirectional camera. In IROS ’08, Nice, France. Google Scholar
  48. Torii, A., & Pajdla, T. (2008). Omnidirectional camera motion estimation. In VISAPP ’08, Funchal, Portugal. Google Scholar
  49. Torii, A., Havlena, M., Pajdla, T., & Leibe, B. (2008). Measuring camera translation by the dominant apical angle. In CVPR ’08, Anchorage, AK, USA. Google Scholar
  50. Williams, B., Klein, G., & Reid, I. (2007). Real-time SLAM relocalisation. In ICCV ’07, Rio de Janeiro, Brazil. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Center for Machine Perception, Department of Cybernetics, Faculty of Elec. Eng.Czech Technical University in PraguePrague 2Czech Republic

Personalised recommendations