Advertisement

International Journal of Computer Vision

, Volume 91, Issue 1, pp 107–130 | Cite as

Learning Real-Time Perspective Patch Rectification

  • Stefan HinterstoisserEmail author
  • Vincent Lepetit
  • Selim Benhimane
  • Pascal Fua
  • Nassir Navab
Article

Abstract

We propose two learning-based methods to patch rectification that are faster and more reliable than state-of-the-art affine region detection methods. Given a reference view of a patch, they can quickly recognize it in new views and accurately estimate the homography between the reference view and the new view. Our methods are more memory-consuming than affine region detectors, and are in practice currently limited to a few tens of patches. However, if the reference image is a fronto-parallel view and the internal parameters known, one single patch is often enough to precisely estimate an object pose. As a result, we can deal in real-time with objects that are significantly less textured than the ones required by state-of-the-art methods.

The first method favors fast run-time performance while the second one is designed for fast real-time learning and robustness. However, they follow the same general approach: First, a classifier provides for every keypoint a first estimate of its transformation. Then, the estimate allows carrying out an accurate perspective rectification using linear predictors. The last step is a fast verification—made possible by the accurate perspective rectification—of the patch identity and its sub-pixel precision position estimation. We demonstrate the advantages of our approach on real-time 3D object detection and tracking applications.

Keywords

Patch rectification Tracking by detection Object recognition Online learning Real-time learning Pose estimation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baker, S., Datta, A., & Kanade, T. (2006). Parameterizing homographies (Tech. Rep. CMU-RI-TR-06-11). CMU. Google Scholar
  2. Baker, S., & Matthews, I. (2004). Lucas-Kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56, 221–255. CrossRefGoogle Scholar
  3. Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: Speeded up robust features. In European conference on computer vision. Google Scholar
  4. Beis, J., & Lowe, D. (1997). Shape indexing using approximate nearest-neighbor search in high-dimensional spaces. In: Conference on computer vision and pattern recognition (pp. 1000–1006). Puerto Rico. Google Scholar
  5. Benhimane, S., & Malis, E. (2007). Homography-based 2d visual tracking and servoing. IJRR, 26(7), 661–676. Google Scholar
  6. Berg, A., & Malik, J. (2002). Geometric blur for template matching. In Conference on computer vision and pattern recognition. Google Scholar
  7. Frey, B. J., & Jojic, N. (2003). Transformation invariant clustering using the em algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 1–17. CrossRefGoogle Scholar
  8. Chum, O., & Matas, J. (2006). Geometric Hashing with Local Affine Frames. In Conference on computer vision and pattern recognition (pp. 879–884). Google Scholar
  9. Goedeme, T., Tuytelaars, T., & Van Gool, L. (2004). Fast wide baseline matching for visual navigation. In Conference on computer vision and pattern recognition. Google Scholar
  10. Grabner, H., & Bischof, H. (2006). On-line boosting and vision. In: Conference on computer vision and pattern recognition. Google Scholar
  11. Grabner, M., Grabner., H., & Bischof, H. (2007). Learning features for tracking. In Conference on computer vision and pattern recognition. Google Scholar
  12. Hinterstoisser, S., Benhimane, S., Navab, N., Fua, P., & Lepetit, V. (2008) Online learning of patch perspective rectification for efficient object detection. In Conference on computer vision and pattern recognition. Google Scholar
  13. Hinterstoisser, S., Kutter, O., Navab, N., Fua, P., & Lepetit, V. (2009). Real-time learning of accurate patch rectification. In Conference on computer vision and pattern recognition. Google Scholar
  14. Jurie, F., & Dhome, M. (2002). Hyperplane approximation for template matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 996–100. CrossRefGoogle Scholar
  15. Lepetit, V., Lagger, P., & Fua, P. (2005). Randomized trees for real-time keypoint recognition. In Conference on computer vision and pattern recognition. San Diego, CA. Google Scholar
  16. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 20(2), 91–110. CrossRefGoogle Scholar
  17. Matas, J., Chum, O., Martin, U., & Pajdla, T. (2002). Robust wide baseline stereo from maximally stable extremal regions. In British machine vision conference (pp. 384–393). London, UK. Google Scholar
  18. Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision. Google Scholar
  19. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Van Gool, L., (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1), 43–72. CrossRefGoogle Scholar
  20. Molton, N., Davison, A., & Reid, I. (2004). Locally planar patch features for real-time structure from motion. In British machine vision conference. Google Scholar
  21. Obdržálek, Š., & Matas, J. (2006). In J. Ponce, M. Herbert, C. Schmid, & A. Zisserman (Eds.), Toward category-level object recognition (pp. 85–108). Springer: Berlin. Chap. 2 Google Scholar
  22. Ozuysal, M., Lepetit, V., Fleuret, F., & Fua, P. (2006). Feature harvesting for tracking-by-detection. In European conference on computer vision. Google Scholar
  23. Ozuysal, M., Calonder, M., Lepetit, V., & Fua, P. (2009). Fast keypoint online learning and recognition. IEEE transactions on pattern analysis and machine intelligence. Google Scholar
  24. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In IEEE conference on computer vision and pattern recognition. Google Scholar
  25. Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2006). Object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision, 66(3), 231–259. CrossRefGoogle Scholar
  26. Salzmann, M., Pilet, J., Ilić, S., & Fua, P. (2007). Surface deformation models for non-rigid 3-d shape recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence. Google Scholar
  27. Taylor, S., & Drummond, T. (2009). Multiple target localisation at over 100 fps. In British machine vision conference. Google Scholar
  28. Triggs, B. (2004). Detecting keypoints with stable position, orientation and scale under illumination changes. In European conference on computer vision. Google Scholar
  29. Williams, B., Klein, G., & Reid, I. (2007). Real-time slam relocalisation. In International conference on computer vision. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Stefan Hinterstoisser
    • 1
    Email author
  • Vincent Lepetit
    • 2
  • Selim Benhimane
    • 1
  • Pascal Fua
    • 2
  • Nassir Navab
    • 1
  1. 1.Computer Aided Medical Procedures (CAMP)Technische Universität München (TUM)MunichGermany
  2. 2.Computer Vision Laboratory (CVLab)École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland

Personalised recommendations