Journal of Real-Time Image Processing

, Volume 2, Issue 2–3, pp 133–147 | Cite as

Robust GPU-assisted camera tracking using free-form surface models

  • Kevin Koeser
  • Bogumil Bartczak
  • Reinhard Koch
Special Issue


We propose a marker-less model-based camera tracking approach, which makes use of GPU-assisted analysis-by-synthesis methods on a very wide field of view (e.g. fish-eye) camera. After an initial registration based on a learned database of robust features, the synthesis part of the tracking is performed on graphics hardware, which simulates internal and external parameters of the camera, this way minimizing lens and viewpoint differences between a model view and a real camera image. Based on an automatically reconstructed free-form surface model we analyze the sensitivity of the tracking to the model accuracy, in particular the case when we represent curved surfaces by planar patches. We also examine accuracy and show on synthetic and on real data that the system does not suffer from drift accumulation. The wide field of view of the camera and the subdivision of our reference model into many textured free-form surface patches make the system robust against illumination changes, moving persons and other occlusions within the environment and provide a camera pose estimate in a fixed and known coordinate system.


Analysis by synthesis Markerless tracking Spherical camera GPU 



This work has been partially funded by the European Union in project MATRIS IST-002013.


  1. 1.
    Adelson, E.H., Bergen, J.: The plenoptic function and the elements of early vision. In: Landy, M., Movshon, J.A. (eds) Computation Models of Visual Processing, pp. 3–20. MIT Press, Cambridge (1991)Google Scholar
  2. 2.
    Baker, S., Gross, R., Matthews, I.: Lucas–Kanade 20 years on: a unifying framework: part 3. Technical Report CMU-RI-TR-03-35, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA (2003)Google Scholar
  3. 3.
    Baker, S., Matthews, I.: Lucas–Kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004)CrossRefGoogle Scholar
  4. 4.
    Bartczak, B., Koeser, K., Woelk, F., Koch, R.: Extraction of 3D freeform surfaces as visual landmarks for real-time tracking. J. Real-time Image Process. (this issue) doi: 10.1007/s11554-007-0042-0
  5. 5.
    Beis, J.S., Lowe, D.G.: Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR ’97), p. 1000ff., Washington, DC, USA. IEEE Computer Society (1997)Google Scholar
  6. 6.
    Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997)CrossRefGoogle Scholar
  7. 7.
    Bishop, C.M.: Pattern recognition and machine learning. Springer, Heidelberg (2006)Google Scholar
  8. 8.
    Bleser, G., Becker, M., Stricker, D.: Real-time vision-based tracking and reconstruction. J. Real-time Image Process. (this issue) doi:  10.1007/s11554-007-0034-0
  9. 9.
    Bleser, G., Wuest, H.: Stricker online camera pose estimation in partially known and dynamic scenes. In: Proceedings ISMAR 2006, Los Alamitos, CA, pp. 56–65 (2006)Google Scholar
  10. 10.
    Chandaria, J., Thomas, G., Bartczak, B., Koeser, K., Koch, R., Becker, M., Bleser, G., Stricker, D., Wohlleber, C., Felsberg, M., Hol, J., Schoen, T., Skoglund, J., Slycke, P., Smeitz, S.: Real-time camera tracking in the MATRIS project. In: Proceedings of International Broadcasting Convention (IBC), Amsterdam, The Netherlands, pp. 321–328 (2006)Google Scholar
  11. 11.
    NVIDIA Corporation. nvidia cg homepage.
  12. 12.
    Davison, A.J.: Real-time simultaneous localisation and mapping with a single camera. In: Proceedings International Conference Computer Vision, Nice (2003)Google Scholar
  13. 13.
    Davison, A.J.: Locally planar patch features for real-time structure from motion. In: Proceedings of BMVC (2004)Google Scholar
  14. 14.
    Denzler, J., Heigl, B., Zobel, M., Niemann, H.: Plenoptic models in robot vision. In: Künstliche Intelligenz, pp. 62–68 (2003)Google Scholar
  15. 15.
    Duda, R.O., Hart, P.E., Stork, D.E.: Pattern Classification, 2nd edn. Wiley Interscience, New York (2001)Google Scholar
  16. 16.
    Evers-Senne, J-F., Koch, R.: Image based rendering from handheld cameras using quad primitives. In: Vision, Modeling, and Visualization VMV: Proceedings (2003)Google Scholar
  17. 17.
    Felsberg, M., Hedborg, J.: Real-time view-based pose recognition and interpolation for tracking initialization. J. Real-time Image Process. (this issue) doi:  10.1007/s11554-007-0044-y
  18. 18.
    Felsberg, M., Hedborg, J.: Real-time visual recognition of objects and scenes using p-channel matching. In: Proceedings of SCIA, pp. 908–917 (2007)Google Scholar
  19. 19.
    Fernando, R., Kilgard, M.J.: The Cg Tutorial: the Definitive Guide to Programmable Real-Time Graphics. Addison-Wesley, Reading (2003)Google Scholar
  20. 20.
    Fleck, M.: Perspective projection: The Wrong Imaging Model. Technical Report TR 95-01, Computer Science, University of Iowa (1995)Google Scholar
  21. 21.
    Foxlin, E., Naimark, L.: VIS-Tracker: a wearable vision-inertial self-tracker. In: Proceedings of IEEE Conference on Virtual Reality (VR 2003), Los Angeles, CA (2003)Google Scholar
  22. 22.
    Fusiello, A.: Improving feature tracking with robust statistics. In: Pattern Analysis and Application, vol. 2, pp. 312–320. Springer, London (1999)Google Scholar
  23. 23.
    Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions (Wiley Series in Probability and Statistics). Wiley-Interscience, New York (revised edition, 2005)Google Scholar
  24. 24.
    Heigl, B., Denzler, J., Niemann, H.: Combining computer graphics and computer vision for probabilistic visual robot navigation. In: Jacques G. Verly (eds) Enhanced and Synthetic Vision 2000, vol. 4023, pp. 226–235 (2000)Google Scholar
  25. 25.
    Huber, P.J.: Robust estimation of a location parameter. Ann. Stat. 35(1), 73–101 (1964)MathSciNetGoogle Scholar
  26. 26.
    Aloimonos, Y., Neumann, J., Fermuller, C.: Eyes from eyes: new cameras for structure from motion. In: In IEEE Workshop on Omnidirectional Vision, pp. 19–26 (2002)Google Scholar
  27. 27.
    Jähne, B.: Digitale Bildverarbeitung, 4th edn. Springer, Heidelberg (1997)Google Scholar
  28. 28.
    McGlone, J.C. (eds): Manual of Photogrammetry, chap. 2, 5th edn. ASPRS, pp. 98–102 (2004)Google Scholar
  29. 29.
    Julier, S., Uhlmann, J.: A new extension of the Kalman filter to nonlinear systems. In: International Symposium on Aerospace/Defense Sensing, Simulation and Controls, Orlando (1997)Google Scholar
  30. 30.
    Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 511–517 (2004)Google Scholar
  31. 31.
    Koch, R.: Dynamic 3D scene analysis through synthesis feedback control. IEEE Trans. Patt. Anal. Mach. Intell. Special issue on analysis and synthesis, vol. 15(6), 556–568 (1993)Google Scholar
  32. 32.
    Koch, R., Heigl, B., Pollefeys, M.: Image-Based Rendering from Uncalibrated Lightfields with Scalable Geometry. Lecture Notes in Computer Science (Multi-Image Analysis) , vol. 2032, pp. 51–66 (2001)Google Scholar
  33. 33.
    Koeser, K., Bartczak, B., Koch, R.: Drift-free pose estimation with hemispherical cameras. In: Proceedings of CVMP 2006, London, pp. 20–28 (2006)Google Scholar
  34. 34.
    Koeser, K., Haertel, V., Koch, R.: Robust feature representation for efficient camera registration. In: Lecture Notes in Computer Science 4174 (DAGM06), pp. 739–749 (2006)Google Scholar
  35. 35.
    Lepetit, V., Fua, P.: Monocular model-based 3d tracking of rigid objects: a survey. Found. Trends Comput. Graph. Vis. 1(1), 1–89 (2005)CrossRefGoogle Scholar
  36. 36.
    Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  37. 37.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  38. 38.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI81, pp. 674–679 (1981)Google Scholar
  39. 39.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of BMVC02 (2002)Google Scholar
  40. 40.
    Micusik, B., Pajdla, T.: Structure from motion with wide circular field of view cameras. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1135–1149 (2006)CrossRefGoogle Scholar
  41. 41.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Patt. Anal. Mach. Intell. 27(10), 1615–1630 (2005)CrossRefGoogle Scholar
  42. 42.
    Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65 (1–2), 43–72 (2005)CrossRefGoogle Scholar
  43. 43.
    Nistér, D.: Automatic dense reconstruction from uncalibrated video sequences. Ph.D. Thesis, Kungl Tekniska Hogskolen (2001)Google Scholar
  44. 44.
    Nistér, D.: Preemptive RANSAC for live structure and motion estimation. In: Proceedings of ICCV, pp. 199–206 (2003)Google Scholar
  45. 45.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. IEEE. Conf. Comput. Vis. Pattern. Recogn. (CVPR) 2, 2161–2168 (2006)Google Scholar
  46. 46.
    Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. Int. J. Comput. Vis. 59(3), 207–232 (2004)Google Scholar
  47. 47.
    Salomon, D.: Transformations and Projections in Computer Graphics. Springer, London (2006)MATHGoogle Scholar
  48. 48.
    Scaramuzza, D., Martinelli, A., Siegwart, R.: A flexible technique for accurate omnidirectional camera calibration and structure from motion. Proc. IEEE. Int. Conf. Vis. Syst., p. 45ff (2006)Google Scholar
  49. 49.
    Skrypnyk, I., Lowe, D.G.: Scene modelling, recognition and tracking with invariant image features. In: IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 110–119 (2004)Google Scholar
  50. 50.
    Streckel, B., Koch, R.: Lens model selection for visual tracking. In: Lecture Notes in Computer Science 3663 (DAGM 2005), Vienna (2005)Google Scholar
  51. 51.
    Stricker, D., Kettenbach, T.: Real-time markerless vision-based tracking for outdoor augmented reality applications. In: IEEE and ACM International Symposium on Augmented Reality (ISAR 2001) (2001)Google Scholar
  52. 52.
    Thomas, G.A., Jin, J., Niblett, T., Urquhart, C.: A versatile camera position measurement system for virtual reality. In: Proceedings of International Broadcasting Convention, pp. 284–289 (1997)Google Scholar
  53. 53.
    Thomas, G.A.: Real-time camera pose estimation for augmenting sports scenes. In: Proceedings of CVMP 2006, London, pp. 10–19 (2006)Google Scholar
  54. 54.
    Thomas, G.A.: Real-time camera tracking using sports pitch markings. J. Real-time Image Process. (this issue) doi:  10.1007/s11554-007-0041-1
  55. 55.
    Zhang, Z.Y.: Parameter-estimation techniques: a tutorial with application to conic fitting. IVC 15(1), 59–76 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.Institute of Computer ScienceChristian-Albrechts-Universität KielKielGermany

Personalised recommendations