International Journal of Computer Vision

, Volume 97, Issue 1, pp 18–35

Interesting Interest Points

A Comparative Study of Interest Point Performance on a Unique Data Set
  • Henrik Aanæs
  • Anders Lindbjerg Dahl
  • Kim Steenstrup Pedersen
Article
  • 1.1k Downloads

Abstract

Not all interest points are equally interesting. The most valuable interest points lead to optimal performance of the computer vision method in which they are employed. But a measure of this kind will be dependent on the chosen vision application. We propose a more general performance measure based on spatial invariance of interest points under changing acquisition parameters by measuring the spatial recall rate. The scope of this paper is to investigate the performance of a number of existing well-established interest point detection methods. Automatic performance evaluation of interest points is hard because the true correspondence is generally unknown. We overcome this by providing an extensive data set with known spatial correspondence. The data is acquired with a camera mounted on a 6-axis industrial robot providing very accurate camera positioning. Furthermore the scene is scanned with a structured light scanner resulting in precise 3D surface information. In total 60 scenes are depicted ranging from model houses, building material, fruit and vegetables, fabric, printed media and more. Each scene is depicted from 119 camera positions and 19 individual LED illuminations are used for each position. The LED illumination provides the option for artificially relighting the scene from a range of light directions. This data set has given us the ability to systematically evaluate the performance of a number of interest point detectors. The highlights of the conclusions are that the fixed scale Harris corner detector performs overall best followed by the Hessian based detectors and the difference of Gaussian (DoG). The methods based on scale space features have an overall better performance than other methods especially when varying the distance to the scene, where especially FAST corner detector, Edge Based Regions (EBR) and Intensity Based Regions (IBR) have a poor performance. The performance of Maximally Stable Extremal Regions (MSER) is moderate. We observe a relatively large decline in performance with both changes in viewpoint and light direction. Some of our observations support previous findings while others contradict these findings.

Keywords

Benchmark data set Interest point detectors Performance evaluation Object recognition Scene matching 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aanæs, H., Dahl, A. L., & Perfernov, V. (2009). Technical report on two view ground truth image data (Tech. rep.) DTU Informatics, Technical University of Denmark. Google Scholar
  2. Aanæs, H., Dahl, A. L., & Pedersen, K. S. (2010). On recall rate of interest point detectors. In Proceedings of 3DPVT. http://campwww.informatik.tu-muenchen.de/3DPVT2010/data/media/e-proceeding/session07.html#paper97. Google Scholar
  3. Alvarez, L., Gousseau, Y., & Morel, J. M. (1999). The size of objects in natural and artificial images. In P.W. Hawkes (Ed.), Advances in imaging and electron physics. New York: Academic Press. Google Scholar
  4. Brown, M., Hua, G., & Winder, S. (2011). Discriminative learning of local image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 43–57. CrossRefGoogle Scholar
  5. Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 676–698. CrossRefGoogle Scholar
  6. Crowley, J. L., & Parker, A. C. (1984). A representation for shape based on peaks and ridges in the difference of low-pass transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(2), 156–170. CrossRefGoogle Scholar
  7. Demirci, A. F., Platel, B., Shokoufandeh, A., Florack, L. M. J., & Dickinson, S. J. (2009). The representation and matching of images using top points. Journal of Mathematical Imaging and Vision, 35(2), 103–116. MathSciNetCrossRefGoogle Scholar
  8. Einarsson, P., Chabert, C., Jones, A., Ma, W., Lamond, B., Hawkins, T., Bolas, M., Sylwan, S., & Debevec, P. (2006). Relighting human locomotion with flowed reflectance fields. In Rendering techniques (pp. 183–194). Google Scholar
  9. Forstner, W. (1986). A feature based correspondence algorithms for image matching. International Archives of Photogrammetry and Remote Sensing, 24, 60–166. Google Scholar
  10. Fraundorfer, F., & Bischof, H. (2004). Evaluation of local detectors on non-planar scenes. In Proc. 28th workshop of AAPR (pp. 125–132). Google Scholar
  11. Fraundorfer, F., & Bischof, H. (2005). A novel performance evaluation method of local detectors on non-planar scenes. In Proceedings of computer vision and pattern recognition—CVPR workshops (pp. 33–43). Google Scholar
  12. Furukawa, Y., & Ponce, J. (2007). Accurate, dense, and robust multi-view stereopsis. In 2007 IEEE conference on computer vision and pattern recognition (pp. 1–8). CrossRefGoogle Scholar
  13. Griffin, L. D., Lillholm, M., Crosier, M., & van Sande, J. (2009). Basic image features (bifs) arising from approximate symmetry type. In LNCS: Vol. 5567. Scale space and variational methods in computer vision (pp. 343–355). CrossRefGoogle Scholar
  14. Gustavsson, D. (2009). On texture and geometry in image analysis. Ph.D. thesis, Department of Computer Science, University of Copenhagen, Denmark. Google Scholar
  15. Haeberli, P. (1992). Synthetic lighting for photography. Grafica obscura. http://www.graficaobscura.com/synth/index.html.
  16. Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In 4th Alvey vision conf. (pp. 147–151). Google Scholar
  17. Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press. Google Scholar
  18. Hua, G., Brown, M., & Winder, S. (2007). Discriminant embedding for local image descriptors. In ICCV (pp. 1–8). Google Scholar
  19. Johansen, P., Skelboe, S., Grue, K., & Andersen, J. (1986). Representing signals by their toppoints in scale space. In Proceedings of the international conference on image analysis and pattern recognition (pp. 215–217). New York: IEEE Computer Society Press. Google Scholar
  20. Johansen, P., Nielsen, M., & Olsen, O. F. (2000). Branch points in one-dimensional Gaussian scale space. Journal of Mathematical Imaging and Vision, 13(3), 193–203. MathSciNetMATHCrossRefGoogle Scholar
  21. Kadir, T., Zisserman, A., & Brady, M. (2004). An affine invariant salient region detector. In Proceedings of European conference on computer vision (ECCV) (pp. 228–241). Google Scholar
  22. Konishi, S., Yuille, A., & Coughlan, J. (2003a). A statistical approach to multi-scale edge detection. Image and Vision Computing 21(1):37–48. CrossRefGoogle Scholar
  23. Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003b). Statistical edge detection: Learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57–74. CrossRefGoogle Scholar
  24. Laptev, I., & Lindeberg, T. (2003). A distance measure and a feature likelihood map concept for scale-invariant model matching. International Journal of Computer Vision, 52(2/3), 97–120. CrossRefGoogle Scholar
  25. Lillholm, M., & Griffin, L. (2008). Novel image feature alphabets for object recognition. In Proceedings of ICPR’08. Google Scholar
  26. Lillholm, M., & Pedersen, K. S. (2004). Jet based feature classification. In Proceedings of international conference on pattern recognition. Google Scholar
  27. Lindeberg, T. (1993). Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention. International Journal of Computer Vision, 11, 283–318. CrossRefGoogle Scholar
  28. Lindeberg, T. (1998a). Edge detection and ridge detection with automatic scale selection. International Journal of Computer Vision, 30(2), 117–154. CrossRefGoogle Scholar
  29. Lindeberg, T. (1998b). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116. CrossRefGoogle Scholar
  30. Lowe, D. (1999). Object recognition from local scale-invariant features. In Proc. of 7th ICCV (pp. 1150–1157). Google Scholar
  31. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  32. Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767. CrossRefGoogle Scholar
  33. Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86. CrossRefGoogle Scholar
  34. Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630. CrossRefGoogle Scholar
  35. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72. CrossRefGoogle Scholar
  36. Moreels, P., & Perona, P. (2007). Evaluation of features detectors and descriptors based on 3d objects. International Journal of Computer Vision, 73(3), 263–284. CrossRefGoogle Scholar
  37. Mumford, D., & Gidas, B. (2001). Stochastic models for generic images. Quarterly of Applied Mathematics, 59(1), 85–111. MathSciNetMATHGoogle Scholar
  38. Nielsen, M., & Lillholm, M. (2001). What do features tell about images. In M. Kerckhove (Ed.), LNCS: Vol. 2106. Proc. of Scale-Space’01 (pp. 39–50). Vancouver: Springer. Google Scholar
  39. Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR (Vol. 5). Google Scholar
  40. Ren, X., & Malik, J. (2002). A probabilistic multi-scale model for contour completion based on image statistics. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.), LNCS: Vol. 2350–2353. Proc. of ECCV’02 (pp. 312–327). Copenhagen: Springer. Vol. I. Google Scholar
  41. Ren, X., Fowlkes, C. C., Malik, J. (2008). Learning probabilistic models for contour completion in natural images. International Journal of Computer Vision, 77(1–3), 47–63. CrossRefGoogle Scholar
  42. Salvi, J., Pages, J., & Batlle, J. (2004). Pattern codification strategies in structured light systems. Pattern Recognition, 37(4), 827–849. MATHCrossRefGoogle Scholar
  43. Scharstein, D., & Szeliski, R. (2003). High-accuracy stereo depth maps using structured light. In Proceedings of CVPR (Vol. 1, pp. 195–202). Google Scholar
  44. Schmid, C., & Mohr, R. (1997). Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–535. CrossRefGoogle Scholar
  45. Schmid, C., Mohr, R., & Bauckhage, C. (2000). Evaluation of interest point detectors. International Journal of Computer Vision, 37(4), 151–172. MATHCrossRefGoogle Scholar
  46. Sivic, J., & Zisserman, A. (2006). Video Google: Efficient visual search of videos. Lecture Notes in Computer Science, 4170, 127. CrossRefGoogle Scholar
  47. Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV, 2005. Tenth IEEE international conference on computer vision (pp. 370–377). Google Scholar
  48. Snavely, N., Seitz, S., & Szeliski, R. (2008a). Modeling the world from Internet photo collections. International Journal of Computer Vision, 80(2), 189–210. CrossRefGoogle Scholar
  49. Snavely, N., Seitz, S. M., & Szeliski, R. (2008b). Modeling the world from Internet photo collections. International Journal of Computer Vision 80(2), 189–210. http://phototour.cs.washington.edu/. CrossRefGoogle Scholar
  50. Srivastava, A., Lee, A. B., Simoncelli, E. P., & Zhu, S. C. (2003). On advances in statistical modeling of natural images. Journal of Mathematical Imaging and Vision, 18(1), 17–33. MathSciNetMATHCrossRefGoogle Scholar
  51. Torr, P., & Zisserman, A. (1999). Feature based methods for structure and motion estimation. In Lecture notes in computer science (pp. 278–294). Google Scholar
  52. Trajković, M., & Hedley, M. (1998). Fast corner detection. Image and Vision Computing, 16(2), 75–87. CrossRefGoogle Scholar
  53. Tuytelaars, T., & Van Gool, L. (2004). Matching widely separated views based on affine invariant regions. International Journal of Computer Vision, 59(1), 61–85. CrossRefGoogle Scholar
  54. Winder, S. A. J., & Brown, M. (2007). Learning local image descriptors. In Proceedings of CVPR (pp. 1–8). Google Scholar
  55. Winder, S., Hua, G., & Brown, M. (2009). Picking the best daisy. In Proceedings of CVPR (pp. 178–185). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Henrik Aanæs
    • 1
  • Anders Lindbjerg Dahl
    • 1
  • Kim Steenstrup Pedersen
    • 2
  1. 1.DTU InformaticsTechnical University of DenmarkLyngbyDenmark
  2. 2.E-Science Center, Image Group, Department of Computer ScienceUniversity of CopenhagenCopenhagenDenmark

Personalised recommendations