Feature Harvesting for Tracking-by-Detection

  • Mustafa Özuysal
  • Vincent Lepetit
  • François Fleuret
  • Pascal Fua
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3953)


We propose a fast approach to 3–D object detection and pose estimation that owes its robustness to a training phase during which the target object slowly moves with respect to the camera. No additional information is provided to the system, save a very rough initialization in the first frame of the training sequence. It can be used to detect the target object in each video frame independently.

Our approach relies on a Randomized Tree-based approach to wide-baseline feature matching. Unlike previous classification-based appro- aches to 3–D pose estimation, we do not require an a priori 3–D model. Instead, our algorithm learns both geometry and appearance. In the process, it collects, or harvests, a list of features that can be reliably recognized even when large motions and aspect changes cause complex variations of feature appearances. This is made possible by the great flexibility of Randomized Trees, which lets us add and remove feature points to our list as needed with a minimum amount of extra computation.


Image Feature Feature Point Target Object Image Patch Training Sequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Lepetit, V., Lagger, P., Fua, P.: Randomized Trees for Real-Time Keypoint Recognition. In: Conference on Computer Vision and Pattern Recognition, San Diego, CA (2005)Google Scholar
  2. 2.
    Amit, Y., Geman, D.: Shape Quantization and Recognition with Randomized Trees. Neural Computation 9, 1545–1588 (1997)CrossRefGoogle Scholar
  3. 3.
    Davison, A.: Real-Time Simultaneous Localisation and Mapping with a Single Camera. In: International Conference on Computer Vision, pp. 1403–1410 (2003)Google Scholar
  4. 4.
    Se, S., Lowe, D.G., Little, J.: Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. International Journal of Robotics Research 22, 735–758 (2002)Google Scholar
  5. 5.
    Meltzer, J., Yang, M.H., Gupta, R., Soatto, S.: Multiple View Feature Descriptors from Image Sequences via Kernel Principal Component Analysis. In: European Conference on Computer Vision, pp. 215–227 (2004)Google Scholar
  6. 6.
    Skrypnyk, I., Lowe, D.G.: Scene modelling, recognition and tracking with invariant image features. In: International Symposium on Mixed and Augmented Reality, Arlington, VA, pp. 110–119 (2004)Google Scholar
  7. 7.
    Lepetit, V., Fua, P.: Monocular model-based 3d tracking of rigid objects: A survey. Foundations and Trends in Computer Graphics and Vision 1, 1–89 (2005)CrossRefGoogle Scholar
  8. 8.
    Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. Accepted to International Journal of Computer Vision (2005)Google Scholar
  9. 9.
    Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 20, 91–110 (2004)CrossRefGoogle Scholar
  10. 10.
    Mikolajczyk, K., Schmid, C.: A Performance Evaluation of Local Descriptors. In: Conference on Computer Vision and Pattern Recognition, pp. 257–263 (2003)Google Scholar
  11. 11.
    Pritchard, D., Heidrich, W.: Cloth motion capture. Eurographics 22, 263–271 (2003)Google Scholar
  12. 12.
    Beis, J., Lowe, D.: Shape Indexing using Approximate Nearest-Neighbour Search in High-Dimensional Spaces. In: Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 1000–1006 (1997)Google Scholar
  13. 13.
    Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence (2006) (Accepted for publication)Google Scholar
  14. 14.
    Mar´ee, R., Geurts, P., Piater, J., Wehenkel, L.: Random subwindows for robust image classification. In: Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  15. 15.
    Chum, O., Matas, J.: Matching with PROSAC - Progressive Sample Consensus. In: Conference on Computer Vision and Pattern Recognition, San Diego, CA, pp. 220–226 (2005)Google Scholar
  16. 16.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mustafa Özuysal
    • 1
  • Vincent Lepetit
    • 1
  • François Fleuret
    • 1
  • Pascal Fua
    • 1
  1. 1.Computer Vision LaboratoryÉcole Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland

Personalised recommendations