A Linear Time Histogram Metric for Improved SIFT Matching

  • Ofir Pele
  • Michael Werman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5304)


We present a new metric between histograms such as SIFT descriptors and a linear time algorithm for its computation. It is common practice to use the L 2 metric for comparing SIFT descriptors. This practice assumes that SIFT bins are aligned, an assumption which is often not correct due to quantization, distortion, occlusion etc.

In this paper we present a new Earth Mover’s Distance (EMD) variant. We show that it is a metric (unlike the original EMD [1] which is a metric only for normalized histograms). Moreover, it is a natural extension of the L 1 metric. Second, we propose a linear time algorithm for the computation of the EMD variant, with a robust ground distance for oriented gradients. Finally, extensive experimental results on the Mikolajczyk and Schmid dataset [2] show that our method outperforms state of the art distances.


Scale Invariant Feature Transform JPEG Compression Linear Time Algorithm Oriented Gradient Viewpoint Change 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40(2), 99–121 (2000)CrossRefzbMATHGoogle Scholar
  2. 2.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Analysis and Machine Intelligence 27(10), 1615–1630 (2005)CrossRefGoogle Scholar
  3. 3.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  4. 4.
    Bay, H., Tuytelaars, T., Gool, L.J.V.: Surf: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Dalai, N., Triggs, B., Rhone-Alps, I., Montbonnot, F.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1 (2005)Google Scholar
  6. 6.
    Heikkila, M., Pietikainen, M., Schmid, C.: Description of Interest Regions with Center-Symmetric Local Binary Patterns. In: ICVGIP, pp. 58–69 (2006)Google Scholar
  7. 7.
    Ferrari, V., Tuytelaars, T., Van Gool, L.: Simultaneous object recognition and segmentation by image exploration. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 40–54. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Sudderth, E., Torralba, A., Freeman, W., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: ICCV, vol. 2, pp. 1331–1338 (2005)Google Scholar
  9. 9.
    Arth, C., Leistner, C., Bischof, H.: Robust Local Features and their Application in Self-Calibration and Object Recognition on Embedded Systems. In: CVPR (2007)Google Scholar
  10. 10.
    Mikolajczyk, K., Leibe, B., Schiele, B.: Multiple object class detection with a generative model. In: CVPR (2006)Google Scholar
  11. 11.
    Dorko, G., Schmid, C., Gravir-Cnrs, I., Montbonnot, F.: Selection of scale-invariant parts for object class recognition. In: ICCV, pp. 634–639 (2003)Google Scholar
  12. 12.
    Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)Google Scholar
  14. 14.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  15. 15.
    Snavely, N., Seitz, S., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Transactions on Graphics (TOG) 25(3), 835–846 (2006)CrossRefGoogle Scholar
  16. 16.
    Sivic, J., Everingham, M., Zisserman, A.: Person Spotting: Video Shot Retrieval for Face Sets. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 226–236. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Se, S., Lowe, D., Little, J.: Local and global localization for mobile robots using visuallandmarks. In: IROS, vol. 1 (2001)Google Scholar
  18. 18.
    Brown, M., Lowe, D.: Recognising panoramas. In: ICCV, p. 3 (2003)Google Scholar
  19. 19.
    Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  20. 20.
    Ling, H., Okada, K.: An Efficient Earth Mover’s Distance Algorithm for Robust Histogram Comparison. IEEE Trans. Pattern Analysis and Machine Intelligence 29(5), 840–853 (2007)CrossRefGoogle Scholar
  21. 21.
    Ling, H., Okada, K.: Diffusion distance for histogram comparison. In: CVPR, vol. 1, pp. 246–253 (2006)Google Scholar
  22. 22.
    Werman, M., Peleg, S., Melter, R., Kong, T.: Bipartite graph matching for points on a line or a circle. Journal of Algorithms 7(2), 277–284 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
  24. 24.
    Shen, H., Wong, A.: Generalized texture representation and metric. Computer vision, graphics, and image processing 23(2), 187–206 (1983)CrossRefGoogle Scholar
  25. 25.
    Werman, M., Peleg, S., Rosenfeld, A.: A distance metric for multidimensional histograms. Computer Vision, Graphics, and Image Processing 32(3) (1985)Google Scholar
  26. 26.
    Peleg, S., Werman, M., Rom, H.: A unified approach to the change of resolution: Space and gray-level. IEEE Trans. Pattern Analysis and Machine Intelligence 11(7), 739–742 (1989)CrossRefGoogle Scholar
  27. 27.
    Cha, S., Srihari, S.: On measuring the distance between histograms. Pattern Recognition 35(6), 1355–1370 (2002)CrossRefzbMATHGoogle Scholar
  28. 28.
    Indyk, P., Thaper, N.: Fast image retrieval via embeddings. In: 3rd International Workshop on Statistical and Computational Theories of Vision (October 2003)Google Scholar
  29. 29.
  30. 30.
    Forssén, P., Lowe, D.: Shape Descriptors for Maximally Stable Extremal Regions. In: ICCV, pp. 1–8 (2007)Google Scholar
  31. 31.
  32. 32.
  33. 33.
    Pele, O., Werman, M.: Robust real time pattern matching using bayesian sequential hypothesis testing. IEEE Trans. Pattern Analysis and Machine Intelligence 30(8), 1427–1443 (2008)CrossRefGoogle Scholar
  34. 34.
    Obdrzalek, S., Matas, J.: Sub-linear indexing for large scale object recognition. In: BMVC, vol. 1, pp. 1–10 (2005)Google Scholar
  35. 35.
    Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM) 45(6), 891–923 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Beis, J., Lowe, D.: Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: CVPR, pp. 1000–1006 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ofir Pele
    • 1
  • Michael Werman
    • 1
  1. 1.School of Computer Science and EngineeringThe Hebrew University of JerusalemIsrael

Personalised recommendations