International Journal of Computer Vision

, Volume 65, Issue 1–2, pp 43–72

A Comparison of Affine Region Detectors

  • K. Mikolajczyk
  • T. Tuytelaars
  • C. Schmid
  • A. Zisserman
  • J. Matas
  • F. Schaffalitzky
  • T. Kadir
  • L. Van Gool
Article

Abstract

The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris  (Mikolajczyk and  Schmid, 2002; Schaffalitzky and  Zisserman, 2002) and Hessian points  (Mikolajczyk and  Schmid, 2002), a detector of ‘maximally stable extremal regions', proposed by Matas et al. (2002); an edge-based region detector  (Tuytelaars and Van Gool, 1999) and a detector based on intensity extrema (Tuytelaars and Van Gool, 2000), and a detector of ‘salient regions', proposed by Kadir, Zisserman and Brady (2004). The performance is measured against changes in viewpoint, scale, illumination, defocus and image compression.

The objective of this paper is also to establish a reference test set of images and performance software, so that future detectors can be evaluated in the same framework.

Keywords

affine region detectors invariant image description local features performance evaluation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baumberg, A. 2000. Reliable feature matching across widely separated views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, USA, pp. 774–781.Google Scholar
  2. Brown, M. and Lowe, D. 2003. Recognizing panoramas. In Proceedings of the International Conference on Computer Vision, Nice, France, pp. 1218–1225.Google Scholar
  3. Canny, J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8: 679–698.Google Scholar
  4. Csurka, G., Dance, C., Bray, C., and Fan, L. 2004. Visual categorization with bags of keypoints. In Proceedings Workshop on Statistical Learning in Computer Vision.Google Scholar
  5. Dorko, G. and Schmid, C. 2003. Selection of scale invariant neighborhoods for object class recognition. In Proceedings International Conference on Computer Vision, Nice, France, pp. 634–640.Google Scholar
  6. Fergus, R., Perona, P., and Zisserman, A. 2003. Object class recognition by unsupervised scale-invariant learning. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA.Google Scholar
  7. Ferrari, V., Tuytelaars, T., and Van Gool, L. 2001. Simultaneous object recognition and segmentation by image exploration. In Proceedings European Conference on Computer Vision, Prague, Czech Republic, pp. 40–54.Google Scholar
  8. Ferrari, V., Tuytelaars, T., and Van Gool, L. 2005. Simultaneous object recognition and segmentation from single or multiple model views. International Journal of Computer Vision, to appear.Google Scholar
  9. Goedeme, T., Tuytelaars, T., and Van Gool, L. 2004. Fast wide baseline matching for visual navigation. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, pp. 24–29.Google Scholar
  10. Harris, C. and Stephens, M. 1988. A combined corner and edge detector. In Alvey Vision Conference, pp. 147–151.Google Scholar
  11. Hartley, R.I. and Zisserman, A. 2004. Multiple View Geometry in Computer Vision, 2nd edition, Cambridge University Press, ISBN: 0521540518.Google Scholar
  12. Kadir, T., Zisserman, A., and Brady, M. 2004. An affine invariant salient region detector. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, pp. 345–457.Google Scholar
  13. Lazebnik, S., Schmid, C., and Ponce, J. 2003a. A sparse texture representation using affine-invariant regions. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, pp. 319–324.Google Scholar
  14. Lazebnik, S., Schmid, C., and Ponce, J. 2003b. Affine-invariant local descriptors and neighborhood statistics for texture recognition. In Proceedings of the International Conference on Computer Vision, Nice, France, pp. 649–655.Google Scholar
  15. Lazebnik, S., Schmid, C., and Ponce, J. 2005. A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8):1265–1278.Google Scholar
  16. Lindeberg, T. and Gårding, J. 1997. Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure. Image and Vision Computing 15(6):415–434.Google Scholar
  17. Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision 30(2):79–116.Google Scholar
  18. Lowe, D. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 1150–1157.Google Scholar
  19. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision 60(2):91–110.Google Scholar
  20. Matas, J., Burianek, J., and Kittler, J. 2000. Object Recognition using the Invariant Pixel-Set Signature. In Proceedings of the British Machine Vision Conference, London, UK, pp. 606–615.Google Scholar
  21. Matas, J. Chum, O., Urban, M., and Pajdla, T. 2002. Robust wide-baseline stereo from maximally stable extremal regions. In Proceedings of the British Machine Vision Conference, Cardiff, UK, pp. 384–393.Google Scholar
  22. Matas, J., Chum, O., Urban, M., and Pajdla, T. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing 22(10):761–767.Google Scholar
  23. Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada.Google Scholar
  24. Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark.Google Scholar
  25. Mikolajczyk, K. and Schmid, C. 2003. A performance evaluation of local descriptors. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA.Google Scholar
  26. Mikolajczyk, K., Zisserman, A., and Schmid, C. 2003. Shape recognition with edge-based features. In Proceedings of the British Machine Vision Conference, Norwich, UK.Google Scholar
  27. Mikolajczyk, K. and Schmid, C. 2004. Scale & affine invariant interest point detectors. International Journal on Computer Vision 60(1):63–86.Google Scholar
  28. Mikolajczyk, K. and Schmid, C. 2005. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(10):1615–1630.Google Scholar
  29. Obdržálek, Ŝ. and Matas, J. 2002. Object recognition using local affine frames on distinguished regions. In Proceedings of the British Machine Vision Conference, Cardiff, UK, pp. 113–122.Google Scholar
  30. Opelt, A., Fussenegger, M., Pinz, A., and Auer, P. 2004. Weak hypotheses and boosting for generic object detection and recognition. In Proceedings of European Conference on Computer Vision, Prague, Czech Republic, pp. 71–84.Google Scholar
  31. Pritchett, P. and Zisserman, A. 1998. Wide baseline stereo matching. In Proceedings of the 6th International Conference on Computer Vision, Bombay, India, pp. 754–760.Google Scholar
  32. Rothganger, F., Lazebnik, S., Schmid, C., and Ponce, J. 2003. 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, pp. 272–277.Google Scholar
  33. Rothganger, F., Lazebnik, S., Schmid, C., and Ponce, J. 2005. Object modeling and recognition using local affine-invariant image descriptors and multi-view spatial consraints. International Journal of Computer Vision, to appear.Google Scholar
  34. Schaffalitzky, F., and Zisserman, A. 2002. Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?”. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pp. 414–431.Google Scholar
  35. Schaffalitzky, F. and Zisserman, A. 2003. Automated Location matching in movies. Computer Vision and Image Understanding, 92(2):236–264.Google Scholar
  36. Schmid, C. and Mohr, R. 1997. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(5):530–535.Google Scholar
  37. Se, S., Lowe, D., and Little, J. 2002. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. International Journal of Robotics Research 21(8):735–758.Google Scholar
  38. Sedgewick, R. 1988. Algorithms, 2nd edition. Addison-Wesley.Google Scholar
  39. Sivic, J., and Zisserman, A. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, Nice, France.Google Scholar
  40. Sivic, J., Schaffalitzky, F., and Zisserman, A. 2004. Object level grouping for video shots. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, pp. 724–734.Google Scholar
  41. Sivic, J., and Zisserman, A. 2004. Video data mining using configurations of viewpoint invariant regions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, pp. 488–495.Google Scholar
  42. Tell, D. and Carlsson, S. 2000. Wide baseline point matching using affine invariants computed from intensity profiles. In Proceedings of the 6th European Conference on Computer Vision, Dublin, Ireland, pp. 814–828.Google Scholar
  43. Tell, D. and Carlsson, S. 2002. Combining appearance and topology for wide baseline matching. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pp. 68–81.Google Scholar
  44. Turina, A., Tuytelaars, T., and Van Gool, L. 2001. Efficient Grouping under perspective skew. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, pp. 247–254.Google Scholar
  45. Tuytelaars, T. and Van Gool, L. 1999. Content-based image retrieval based on local affinely invariant regions. In Int. Conf. on Visual Information Systems, pp. 493–500.Google Scholar
  46. Tuytelaars, T., Van Gool, L., D'haene, L., and Koch, R. 1999. Matching of affinely invariant regions for visual servoing. In Int. Conference Robotics and Automation ICRA 99.Google Scholar
  47. Tuytelaars, T. and Van Gool, L. 2000. Wide baseline stereo matching based on local, affinely invariant regions. In Proceedings of the 11th British Machine Vision Conference, Bristol, UK, pp. 412–425.Google Scholar
  48. Tuytelaars, T. and Van Gool, L. 2004. Matching Widely Separated Views based on Affine Invariant Regions. International Journal on Computer Vision 59(1):61–85.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • K. Mikolajczyk
    • 1
  • T. Tuytelaars
    • 2
  • C. Schmid
    • 4
  • A. Zisserman
    • 1
  • J. Matas
    • 3
  • F. Schaffalitzky
    • 1
  • T. Kadir
    • 1
  • L. Van Gool
    • 2
  1. 1.University of OxfordOxfordUnited Kingdom
  2. 2.University of LeuvenLeuvenBelgium
  3. 3.Czech Technical UniversityPragueCzech Republic
  4. 4.INRIA, GRAVIR-CNRSMontbonnotFrance

Personalised recommendations