Accurate Single Image Multi-modal Camera Pose Estimation

  • Christoph Bodensteiner
  • Marcus Hebel
  • Michael Arens
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6554)


A well known problem in photogrammetry and computer vision is the precise and robust determination of camera poses with respect to a given 3D model. In this work we propose a novel multi-modal method for single image camera pose estimation with respect to 3D models with intensity information (e.g., LiDAR data with reflectance information).

We utilize a direct point based rendering approach to generate synthetic 2D views from 3D datasets in order to bridge the dimensionality gap. The proposed method then establishes 2D/2D point and local region correspondences based on a novel self-similarity distance measure. Correct correspondences are robustly identified by searching for small regions with a similar geometric relationship of local self-similarities using a Generalized Hough Transform. After backprojection of the generated features into 3D a standard Perspective-n-Points problem is solved to yield an initial camera pose. The pose is then accurately refined using an intensity based 2D/3D registration approach.

An evaluation on Vis/IR 2D and airborne and terrestrial 3D datasets shows that the proposed method is applicable to a wide range of different sensor types. In addition, the approach outperforms standard global multi-modal 2D/3D registration approaches based on Mutual Information with respect to robustness and speed.

Potential applications are widespread and include for instance multi-spectral texturing of 3D models, SLAM applications, sensor data fusion and multi-spectral camera calibration and super-resolution applications.


Multi-Modal Registration Pose Estimation Multi-Modal 2D/3D Correspondences Self-Similarity Distance Measure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lu, C.P., Hager, G.D., Mjolsness, E.: Fast and globally convergent pose estimation from video images. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 610–622 (2000)CrossRefGoogle Scholar
  2. 2.
    David, P., DeMenthon, D., Duraiswami, R., Samet, H.: Softposit: Simultaneous pose and correspondence determination. International Journal of Computer Vision 59, 259–284 (2004)CrossRefGoogle Scholar
  3. 3.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press (2004) ISBN: 0521540518Google Scholar
  4. 4.
    Lepetit, V., Moreno-Noguer, F., Fua, P.: Epnp: An accurate o(n) solution to the pnp problem. International Journal of Computer Vision 81, 155–166 (2009)CrossRefGoogle Scholar
  5. 5.
    Raguram, R., Frahm, J.-M., Pollefeys, M.: A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 500–513. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Zhang, Z.: A flexible new technique for camera calibration. Technical report, Microsoft Research (1998)Google Scholar
  7. 7.
    Benhimane, S., Malis, E.: Homography-based 2d visual tracking and servoing. The International Journal of Robotics Research 26, 661–667 (2007)CrossRefGoogle Scholar
  8. 8.
    Penney, G., Weese, J., Little, J.A., Desmedt, P., Hill, D.L., Hawkes, D.J.: A comparison of similarity measures for use in 2-d-3-d medical image registration. IEEE Transactions on Medical Imaging 17, 586–595 (1998)CrossRefGoogle Scholar
  9. 9.
    Viola, P., Wells, W.: Alignment by maximization of mutual information. International Journal of Computer Vision 24, 137–154 (1997)CrossRefGoogle Scholar
  10. 10.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  11. 11.
    Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (surf). Computer Vision and Image Understanding 110, 346–359 (2008)CrossRefGoogle Scholar
  12. 12.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1615–1630 (2005)CrossRefGoogle Scholar
  13. 13.
    Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR (2007)Google Scholar
  14. 14.
    Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision 77, 259–289 (2008)CrossRefGoogle Scholar
  15. 15.
    Vasile, A., Waugh, F.R., Greisokh, D., Heinrichs, R.M.: Automatic alignment of color imagery onto 3d laser radar data. In: AIPR (2006)Google Scholar
  16. 16.
    Ding, M., Lyngbaek, K., Zakhor, A.: Automatic registration of aerial imagery with untextured 3d lidar models. In: CVPR (2008)Google Scholar
  17. 17.
    Wang, L., Neumann, U.: A robust approach for automatic registration of aerial images with untextured aerial lidar data. In: CVPR (2009)Google Scholar
  18. 18.
    Mastin, A., Kepner, J., Fisher, J.: Automatic registration of lidar and optical images of urban scenes. In: CVPR (2009)Google Scholar
  19. 19.
    Vosselman, G., Maas, H.G.: Airborne and Terrestrial Laser Scanning. Whittles Publishing, Dunbeath (2010)Google Scholar
  20. 20.
    Wagner, W., Ullrich, A., Ducic, V., Melzer, T., Studnicka, N.: Gaussian decomposition and calibration of a novel small-footprint full-waveform digitising airborne laser scanner. ISPRS Journal of Photogrammetry and Remote Sensing 60 (2006)Google Scholar
  21. 21.
    Gross, M., Pfister, H.: Point-Based Graphics. Morgan Kaufmann (2007)Google Scholar
  22. 22.
    DeMenthon, D.F., Davis, L.S.: Model-based object pose in 25 lines of code. International Journal of Computer Vision 15, 123–141 (1995)CrossRefGoogle Scholar
  23. 23.
    Schroeder, W., Martin, K., Lorensen, B.: The Visualization Toolkit: An Object-Oriented Approach to 3-D Graphics. Kitware (2003)Google Scholar
  24. 24.
    Wu, C.: SiftGPU: A GPU implementation of scale invariant feature transform (SIFT). Technical report, University of North Carolina at Chapel Hill (2007)Google Scholar
  25. 25.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Christoph Bodensteiner
    • 1
  • Marcus Hebel
    • 1
  • Michael Arens
    • 1
  1. 1.Fraunhofer IOSBEttlingenGermany

Personalised recommendations