Journal of Real-Time Image Processing

, Volume 2, Issue 2–3, pp 103–115 | Cite as

Real-time view-based pose recognition and interpolation for tracking initialization

  • Michael Felsberg
  • Johan Hedborg
Special Issue


In this paper we propose a new approach to real-time view-based pose recognition and interpolation. Pose recognition is particularly useful for identifying camera views in databases, video sequences, video streams, and live recordings. All of these applications require a fast pose recognition process, in many cases video real-time. It should further be possible to extend the database with new material, i.e., to update the recognition system online. The method that we propose is based on P-channels, a special kind of information representation which combines advantages of histograms and local linear models. Our approach is motivated by its similarity to information representation in biological systems but its main advantage is its robustness against common distortions such as clutter and occlusion. The recognition algorithm consists of three steps: (1) low-level image features for color and local orientation are extracted in each point of the image; (2) these features are encoded into P-channels by combining similar features within local image regions; (3) the query P-channels are compared to a set of prototype P-channels in a database using a least-squares approach. The algorithm is applied in two scene registration experiments with fisheye camera data, one for pose interpolation from synthetic images and one for finding the nearest view in a set of real images. The method compares favorable to SIFT-based methods, in particular concerning interpolation. The method can be used for initializing pose-tracking systems, either when starting the tracking or when the tracking has failed and the system needs to re-initialize. Due to its real-time performance, the method can also be embedded directly into the tracking system, allowing a sensor fusion unit choosing dynamically between the frame-by-frame tracking and the pose recognition.


Pose recognition Pose interpolation P-channels Real-time processing View-based computer vision 



We thank our project partners for providing the test data used in the experiments. We thank in particular Graham Thomas, Jigna Chandaria, Gabriele Bleser, Reinhard Koch, and Kevin Koeser.


  1. 1.
    Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via sparse, part-based representation. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1475–1490 (2004)CrossRefGoogle Scholar
  2. 2.
    Berg, A., Berg, T., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: IEEE Comput. Vis. Pattern Recognit, vol. 1, pp. 26–33 (2005).  doi:10.1109/CVPR.2005.320
  3. 3.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)Google Scholar
  4. 4.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)Google Scholar
  5. 5.
    Brand, M.: Incremental singular value decomposition of uncertain data with missing values. Technical Report TR-2002-24, Mitsubishi Electric Research Laboratory (2002)Google Scholar
  6. 6.
    Chen, Q., Defrise, M., Deconinck, F.: Symmetric phase-only matched filtering of Fourier–Mellin transforms for image registration and recognition. Trans. Pattern Anal. Mach. Intell. 16(12), 1156–1168 (1994)CrossRefGoogle Scholar
  7. 7.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)zbMATHGoogle Scholar
  8. 8.
    Dimitriadou, E., Weingessel, A., Hornik, K.: Fuzzy voting in clustering. In: Fuzzy-Neuro Systems, pp. 63–75. Leipziger Universitätsverlag, Germany (1999)Google Scholar
  9. 9.
    Farnebäck, G.: Spatial domain methods for orientation and velocity estimation. Lic. Thesis LiU-Tek-Lic-1999:13, Department of EE, Linköping University (1999)Google Scholar
  10. 10.
    Felsberg, M., Forssén, P.-E., Scharr, H.: Channel smoothing: efficient robust smoothing of low-level signal features. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 209–222 (2006)CrossRefGoogle Scholar
  11. 11.
    Felsberg, M., Granlund, G.: P-channels: robust multivariate m-estimation of large datasets. In: International Conference on Pattern Recognition, Hong Kong (2006)Google Scholar
  12. 12.
    Felsberg, M., Hedborg, J.: Real-time visual recognition of objects and scenes using p-channel matching. In: Proceedings of 15th Scandinavian Conference on Image Analysis. LNCS, vol. 4522, pp. 908–917 (2007)Google Scholar
  13. 13.
    Ferraro, M., Caelli, T.M.: Lie transformation groups, integral transforms, and invariant pattern recognition. Spat. Vis. 8(4), 33–44 (1994)Google Scholar
  14. 14.
    Fisher, R.B., Dawson-Howe, K., Fitzgibbon, A., Robertson, C., Trucco, E.: Dictionary of Computer Vision and Image Processing. Wiley, London (2005)Google Scholar
  15. 15.
    Forssén, P.-E.: Low and medium level vision using channel representations. PhD thesis, Linköping University, Sweden (2004)Google Scholar
  16. 16.
    Gazzaniga, M.S., Ivry, R.B., Mangun, G.R.: Cognitive Neuroscience, 2nd edn. W. W. Norton & Company, New York (2002)Google Scholar
  17. 17.
    Gopalsamy, K.: Stability of artificial neural networks with impulses. Appl. Math. Comput. 154(3), 783–813 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Granlund, G.H.: The complexity of vision. Signal Process. 74(1), 101–126 (1999)zbMATHCrossRefGoogle Scholar
  19. 19.
    Granlund, G.H.: An associative perception–action structure using a localized space variant information representation. In: Proceedings of Algebraic Frames for the Perception–Action Cycle (AFPAC), Kiel, Germany (2000)Google Scholar
  20. 20.
    Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer, Dordrecht (1995)Google Scholar
  21. 21.
    Granlund, G.H., Moe, A.: Unrestricted recognition of 3-d objects for robotics using multi-level triplet invariants. Artif. Intell. Mag. 25(2), 51–67 (2004)Google Scholar
  22. 22.
    Gustafsson, F.: Adaptive Filtering and Change Detection. Wiley, London (2000)Google Scholar
  23. 23.
    Hol J, Schön, T.B., Luinge, H., Slycke, P., Gustafsson, F.: Robust real-time tracking by fusing measurements from inertial and vision sensors (2007).  doi:10.1007/s11554-007-0040-2
  24. 24.
    Johansson, B., Elfving, T., Kozlov, V., Censor, Y., Forssén, P.-E., Granlund, G.: The application of an oblique-projected landweber method to a model of supervised learning. Math. Comput. Model. 43, 892–909 (2006)CrossRefzbMATHGoogle Scholar
  25. 25.
    Jonsson, E., Felsberg, M.: Reconstruction of probability density functions from channel representations. In: Proceedings of 14th Scandinavian Conference on Image Analysis. LNCS, vol. 3540, pp. 491–500 (2005).  doi:10.1007/11499145_50
  26. 26.
    Jonsson, E., Felsberg, M.: Accurate interpolation in appearance-based pose estimation. In: Proceedings of 15th Scandinavian Conference on Image Analysis. LNCS, vol. 4522, pp. 1–10 (2007)Google Scholar
  27. 27.
    Knutsson, H., Andersson, M.: Robust N-dimensional orientation estimation using quadrature filters and tensor whitening. In: Proceedings of IEEE International Conference on Acoustics, Speech, & Signal Processing, Adelaide, Australia (1994)Google Scholar
  28. 28.
    Krüger, N.: Learning object representations using a priori constraints within ORASSYLL. Neural Comput. 13(2), 389–410 (2001)CrossRefGoogle Scholar
  29. 29.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  30. 30.
    Mühlich, M., Mester, R.: A considerable improvement in non-iterative homography estimation using TLS and equilibration. Pattern Recognit. Lett. 22, 1181–1189 (2001)CrossRefGoogle Scholar
  31. 31.
    Murphy-Chutorian, E., Aboutalib, S., Triesch, J.: Analysis of a biologically-inspired system for real-time object recognition. Cogn. Sci. Online 3, 1–14 (2005)Google Scholar
  32. 32.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: IEEE Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168 (2006).  doi:10.1109/CVPR.2006.264
  33. 33.
    Obdržálek, Š., Matas, J.: Sub-linear indexing for large scale object recognition. In: Clocksin, W.F., Fitzgibbon, A.W., Torr, P.H.S. (eds.) BMVC 2005: Proceedings of the 16th British Machine Vision Conference, vol. 1, pp. 1–10. BMVA, London (2005)Google Scholar
  34. 34.
    Pontil, M., Verri, A.: Support vector machines for 3d object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 20(6), 637–646 (1998)CrossRefGoogle Scholar
  35. 35.
    Roobaert, D., Zillich, M., Eklundh, J.-O.: A pure learning approach to background-invariant object recognition using pedagogical support vector learning. In: IEEE Comput. Vis. Pattern Recognit. 2, 351–357 (2001)Google Scholar
  36. 36.
    Skoglund, J., Felsberg, M.: Evaluation of subpixel tracking algorithms. In: International Symposium on Visual Computing. LNCS, vol. 4292, pp. 374–382 (2006)Google Scholar
  37. 37.
    Skoglund, J., Felsberg, M.: Covariance estimation for sad block matching. In: Proceedings of 15th Scandinavian Conference on Image Analysis. LNCS, vol. 4522, pp. 372–382 (2007)Google Scholar
  38. 38.
    Snippe, H.P., Koenderink, J.J.: Discrimination thresholds for channel-coded systems. Biol. Cybern. 66, 543–551 (1992)zbMATHCrossRefGoogle Scholar
  39. 39.
    Chandaria, J., Stricker, D., Thomas, G.: The MATRIS project: real-time markerless camera tracking for AR and broadcast applications. J. Real-Time Image Process (2007, in this issue)Google Scholar
  40. 40.
    Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)CrossRefGoogle Scholar
  41. 41.
    Unser, M.: Splines—a perfect fit for signal and image processing. IEEE Signal Process. Mag. 16, 22–38 (1999)CrossRefGoogle Scholar
  42. 42.
    Vedaldi, A.: An open implementation of SIFT. vedaldi/code/sift/sift.html. Accessed 23 May 2007

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.Department of Electrical Engineering, Computer Vision LaboratoryLinköping UniversityLinköpingSweden

Personalised recommendations