Real-time view-based pose recognition and interpolation for tracking initialization

Felsberg, Michael; Hedborg, Johan

doi:10.1007/s11554-007-0044-y

Real-time view-based pose recognition and interpolation for tracking initialization

Special Issue
Published: 10 October 2007

Volume 2, pages 103–115, (2007)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Michael Felsberg¹ &
Johan Hedborg¹

184 Accesses
7 Citations
Explore all metrics

Abstract

In this paper we propose a new approach to real-time view-based pose recognition and interpolation. Pose recognition is particularly useful for identifying camera views in databases, video sequences, video streams, and live recordings. All of these applications require a fast pose recognition process, in many cases video real-time. It should further be possible to extend the database with new material, i.e., to update the recognition system online. The method that we propose is based on P-channels, a special kind of information representation which combines advantages of histograms and local linear models. Our approach is motivated by its similarity to information representation in biological systems but its main advantage is its robustness against common distortions such as clutter and occlusion. The recognition algorithm consists of three steps: (1) low-level image features for color and local orientation are extracted in each point of the image; (2) these features are encoded into P-channels by combining similar features within local image regions; (3) the query P-channels are compared to a set of prototype P-channels in a database using a least-squares approach. The algorithm is applied in two scene registration experiments with fisheye camera data, one for pose interpolation from synthetic images and one for finding the nearest view in a set of real images. The method compares favorable to SIFT-based methods, in particular concerning interpolation. The method can be used for initializing pose-tracking systems, either when starting the tracking or when the tracking has failed and the system needs to re-initialize. Due to its real-time performance, the method can also be embedded directly into the tracking system, allowing a sensor fusion unit choosing dynamically between the frame-by-frame tracking and the pose recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

LSD-SLAM: Large-Scale Direct Monocular SLAM

Real-time face alignment: evaluation methods, training strategies and implementation optimization

Article Open access 26 April 2021

Notes

http://www.ist-matris.org.
By features we denote “a numerical property” [14].
By recognition we denote “identification” aka “the process of associating some observations with a particular instance […] that is already known” [14].
There is no formal requirement for having regularly spaced channels or spatially invariant basis functions. For the purpose of density estimation, these restrictions make however sense and simplify the understanding.
Sampled in a signal processing sense, not in statistical sense.
We based our implementation on the OpenCV library http://sourceforge.net/projects/opencvlibrary/.
Note that the feature extraction needs to be done on the whole image in general, as the localization of relevant areas is unknown at this stage.

References

Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via sparse, part-based representation. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1475–1490 (2004)
Article Google Scholar
Berg, A., Berg, T., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: IEEE Comput. Vis. Pattern Recognit, vol. 1, pp. 26–33 (2005). doi:10.1109/CVPR.2005.320
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Brand, M.: Incremental singular value decomposition of uncertain data with missing values. Technical Report TR-2002-24, Mitsubishi Electric Research Laboratory (2002)
Chen, Q., Defrise, M., Deconinck, F.: Symmetric phase-only matched filtering of Fourier–Mellin transforms for image registration and recognition. Trans. Pattern Anal. Mach. Intell. 16(12), 1156–1168 (1994)
Article Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
MATH Google Scholar
Dimitriadou, E., Weingessel, A., Hornik, K.: Fuzzy voting in clustering. In: Fuzzy-Neuro Systems, pp. 63–75. Leipziger Universitätsverlag, Germany (1999)
Farnebäck, G.: Spatial domain methods for orientation and velocity estimation. Lic. Thesis LiU-Tek-Lic-1999:13, Department of EE, Linköping University (1999)
Felsberg, M., Forssén, P.-E., Scharr, H.: Channel smoothing: efficient robust smoothing of low-level signal features. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 209–222 (2006)
Article Google Scholar
Felsberg, M., Granlund, G.: P-channels: robust multivariate m-estimation of large datasets. In: International Conference on Pattern Recognition, Hong Kong (2006)
Felsberg, M., Hedborg, J.: Real-time visual recognition of objects and scenes using p-channel matching. In: Proceedings of 15th Scandinavian Conference on Image Analysis. LNCS, vol. 4522, pp. 908–917 (2007)
Ferraro, M., Caelli, T.M.: Lie transformation groups, integral transforms, and invariant pattern recognition. Spat. Vis. 8(4), 33–44 (1994)
Google Scholar
Fisher, R.B., Dawson-Howe, K., Fitzgibbon, A., Robertson, C., Trucco, E.: Dictionary of Computer Vision and Image Processing. Wiley, London (2005)
Forssén, P.-E.: Low and medium level vision using channel representations. PhD thesis, Linköping University, Sweden (2004)
Gazzaniga, M.S., Ivry, R.B., Mangun, G.R.: Cognitive Neuroscience, 2nd edn. W. W. Norton & Company, New York (2002)
Gopalsamy, K.: Stability of artificial neural networks with impulses. Appl. Math. Comput. 154(3), 783–813 (2004)
Article MATH MathSciNet Google Scholar
Granlund, G.H.: The complexity of vision. Signal Process. 74(1), 101–126 (1999)
Article MATH Google Scholar
Granlund, G.H.: An associative perception–action structure using a localized space variant information representation. In: Proceedings of Algebraic Frames for the Perception–Action Cycle (AFPAC), Kiel, Germany (2000)
Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer, Dordrecht (1995)
Google Scholar
Granlund, G.H., Moe, A.: Unrestricted recognition of 3-d objects for robotics using multi-level triplet invariants. Artif. Intell. Mag. 25(2), 51–67 (2004)
Google Scholar
Gustafsson, F.: Adaptive Filtering and Change Detection. Wiley, London (2000)
Hol J, Schön, T.B., Luinge, H., Slycke, P., Gustafsson, F.: Robust real-time tracking by fusing measurements from inertial and vision sensors (2007). doi:10.1007/s11554-007-0040-2
Johansson, B., Elfving, T., Kozlov, V., Censor, Y., Forssén, P.-E., Granlund, G.: The application of an oblique-projected landweber method to a model of supervised learning. Math. Comput. Model. 43, 892–909 (2006)
Article MATH Google Scholar
Jonsson, E., Felsberg, M.: Reconstruction of probability density functions from channel representations. In: Proceedings of 14th Scandinavian Conference on Image Analysis. LNCS, vol. 3540, pp. 491–500 (2005). doi:10.1007/11499145_50
Jonsson, E., Felsberg, M.: Accurate interpolation in appearance-based pose estimation. In: Proceedings of 15th Scandinavian Conference on Image Analysis. LNCS, vol. 4522, pp. 1–10 (2007)
Knutsson, H., Andersson, M.: Robust N-dimensional orientation estimation using quadrature filters and tensor whitening. In: Proceedings of IEEE International Conference on Acoustics, Speech, & Signal Processing, Adelaide, Australia (1994)
Krüger, N.: Learning object representations using a priori constraints within ORASSYLL. Neural Comput. 13(2), 389–410 (2001)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Mühlich, M., Mester, R.: A considerable improvement in non-iterative homography estimation using TLS and equilibration. Pattern Recognit. Lett. 22, 1181–1189 (2001)
Article Google Scholar
Murphy-Chutorian, E., Aboutalib, S., Triesch, J.: Analysis of a biologically-inspired system for real-time object recognition. Cogn. Sci. Online 3, 1–14 (2005)
Google Scholar
Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: IEEE Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168 (2006). doi:10.1109/CVPR.2006.264
Obdržálek, Š., Matas, J.: Sub-linear indexing for large scale object recognition. In: Clocksin, W.F., Fitzgibbon, A.W., Torr, P.H.S. (eds.) BMVC 2005: Proceedings of the 16th British Machine Vision Conference, vol. 1, pp. 1–10. BMVA, London (2005)
Pontil, M., Verri, A.: Support vector machines for 3d object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 20(6), 637–646 (1998)
Article Google Scholar
Roobaert, D., Zillich, M., Eklundh, J.-O.: A pure learning approach to background-invariant object recognition using pedagogical support vector learning. In: IEEE Comput. Vis. Pattern Recognit. 2, 351–357 (2001)
Skoglund, J., Felsberg, M.: Evaluation of subpixel tracking algorithms. In: International Symposium on Visual Computing. LNCS, vol. 4292, pp. 374–382 (2006)
Skoglund, J., Felsberg, M.: Covariance estimation for sad block matching. In: Proceedings of 15th Scandinavian Conference on Image Analysis. LNCS, vol. 4522, pp. 372–382 (2007)
Snippe, H.P., Koenderink, J.J.: Discrimination thresholds for channel-coded systems. Biol. Cybern. 66, 543–551 (1992)
Article MATH Google Scholar
Chandaria, J., Stricker, D., Thomas, G.: The MATRIS project: real-time markerless camera tracking for AR and broadcast applications. J. Real-Time Image Process (2007, in this issue)
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)
Article Google Scholar
Unser, M.: Splines—a perfect fit for signal and image processing. IEEE Signal Process. Mag. 16, 22–38 (1999)
Article Google Scholar
Vedaldi, A.: An open implementation of SIFT. http://vision.ucla.edu/ vedaldi/code/sift/sift.html. Accessed 23 May 2007

Download references

Acknowledgments

We thank our project partners for providing the test data used in the experiments. We thank in particular Graham Thomas, Jigna Chandaria, Gabriele Bleser, Reinhard Koch, and Kevin Koeser.

Author information

Authors and Affiliations

Department of Electrical Engineering, Computer Vision Laboratory, Linköping University, Linköping, 58183, Sweden
Michael Felsberg & Johan Hedborg

Authors

Michael Felsberg
View author publications
You can also search for this author in PubMed Google Scholar
Johan Hedborg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Felsberg.

Additional information

This work has been supported by the CENIIT project CAIRIS (http://www.cvl.isy.liu.se/Research/Object/CAIRIS), EC Grants IST-2003-004176 COSPAL and IST-2002-002013 MATRIS. This paper does not represent the opinion of the European Community, and the European Community is not responsible for any use which may be made of its contents.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Felsberg, M., Hedborg, J. Real-time view-based pose recognition and interpolation for tracking initialization. J Real-Time Image Proc 2, 103–115 (2007). https://doi.org/10.1007/s11554-007-0044-y

Download citation

Received: 05 June 2007
Accepted: 27 August 2007
Published: 10 October 2007
Issue Date: November 2007
DOI: https://doi.org/10.1007/s11554-007-0044-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time view-based pose recognition and interpolation for tracking initialization

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

LSD-SLAM: Large-Scale Direct Monocular SLAM

Real-time face alignment: evaluation methods, training strategies and implementation optimization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time view-based pose recognition and interpolation for tracking initialization

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

LSD-SLAM: Large-Scale Direct Monocular SLAM

Real-time face alignment: evaluation methods, training strategies and implementation optimization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation