SIFTpack: A Compact Representation for Efficient SIFT Matching

  • Alexandra Gilinsky
  • Lihi Zelnik-ManorEmail author


Computing distances between large sets of SIFT descriptors is a basic step in numerous algorithms in computer vision. When the number of descriptors is large, as is often the case, computing these distances can be extremely time consuming. We propose the SIFTpack: a compact way of storing SIFT descriptors, which enables significantly faster calculations between sets of SIFTs than the current solutions. SIFTpack can be used to represent SIFTs densely extracted from a single image or sparsely from multiple different images. We show that the SIFTpack representation saves both storage space and run time, for both finding nearest neighbors and computing all distances between all descriptors. The usefulness of SIFTpack is demonstrated as an alternative implementation for K-means dictionaries of visual words and for image retrieval.


Image Retrieval Visual Word Storage Space Image Patch Representation Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research was supported in part by the Ollendorf Foundation and by the Israel Ministry of Science. We would also like to thank Prof. Michael Elad for useful conversations and good ideas.


  1. 1.
    Agarwal, A., Triggs, B.: Hyperfeatures - multilevel local coding for visual recognition. In: ECCV, pp. 30–43 (2006)Google Scholar
  2. 2.
    Aharon, M., Elad, M.: Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM J. Imag. Sci. 1(3), 228–247 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bagon, S., Boiman, O., Irani, M.: What is a good image segment? A unified approach to segment extraction. In: ECCV, pp. 30–44 (2008)Google Scholar
  5. 5.
    Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (2001)CrossRefGoogle Scholar
  6. 6.
    Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: Patchmatch: a randomized correspondence algorithm for structural image editing. SIGGRAPH 28(3), 24:1–24:11 (2009)Google Scholar
  7. 7.
    Barnes, C., Shechtman, E., Goldman, D.B., Finkelstein, A.: The generalized patchmatch correspondence algorithm. In: ECCV, pp. 29–43 (2010)Google Scholar
  8. 8.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
  9. 9.
    Benoît, L., Mairal, J., Bach, F., Ponce, J.: Sparse image representation with epitomes. In: CVPR, pp. 2913–2920 (2011)Google Scholar
  10. 10.
    Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., Fua, P.: BRIEF: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2012)CrossRefGoogle Scholar
  11. 11.
    Crow, F.C.: Summed-area tables for texture mapping. SIGGRAPH 18(3), 207–212 (1984)CrossRefGoogle Scholar
  12. 12.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV, vol. 1, pp. 1–22 (2004)Google Scholar
  13. 13.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision & Pattern Recognition, vol. 2, pp. 886–893 (2005)Google Scholar
  14. 14.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp. 253–262 (2004)Google Scholar
  15. 15.
    Deselaers, T., Ferrari, V.: Global and efficient self-similarity for object classification and detection. In: CVPR, pp. 1633–1640 (2010)Google Scholar
  16. 16.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, pp. 524–531 (2005)Google Scholar
  17. 17.
    Furuya, T., Ohbuchi, R.: Dense sampling and fast encoding for 3d model retrieval using bag-of-visual features. In: CIVR, pp. 26:1–26:8 (2009)Google Scholar
  18. 18.
    Gilinsky, A., Zelnik-Manor, L.: Siftpack: a compact representation for efficient sift matching. In: IEEE International Conference on Computer Vision (ICCV). IEEE, New York (2013)Google Scholar
  19. 19.
    He, K., Sun, J.: Computing nearest-neighbor fields via propagation-assisted kd-trees. In: CVPR, pp. 111–118 (2012)Google Scholar
  20. 20.
    Janet, B., Reddy, A.: Image index model for retrieval using hausdorff distortion. In: ICCAIE, pp. 85–89 (2010)Google Scholar
  21. 21.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: ECCV, pp. 304–317 (2008)Google Scholar
  22. 22.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011)CrossRefGoogle Scholar
  23. 23.
    Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR, pp. 1943–1950 (2010)Google Scholar
  24. 24.
    Joulin, A., Bach, F., Ponce, J.: Multi-class cosegmentation. In: CVPR, pp. 542–549 (2012)Google Scholar
  25. 25.
    Ke, Y., Sukthankar, R.: Pca-sift: A more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 506–513 (2004)Google Scholar
  26. 26.
    Khapli, V.R., Bhalchandra, A.S.: Compressed domain image retrieval using thumbnails of images. In: CICSyN, pp. 392–396 (2009)Google Scholar
  27. 27.
    Khapli, V.R., Bhalchandra, A.S.: Image retrieval for compressed and uncompressed images. In: ICICS, pp. 1140–1143 (2009)Google Scholar
  28. 28.
    Korman, S., Avidan, S.: Coherency sensitive hashing. In: ICCV, pp. 1607–1614 (2011)Google Scholar
  29. 29.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)Google Scholar
  30. 30.
    Li, Z., Imai, J., Kaneko, M.: Robust face recognition using block-based bag of words. In: ICPR, pp. 1285–1288 (2010)Google Scholar
  31. 31.
    Liu, C., Yuen, J., Torralba, A.: Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)CrossRefGoogle Scholar
  32. 32.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  33. 33.
    Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)CrossRefzbMATHGoogle Scholar
  34. 34.
    Michael, N., Metaxas, D., Neidle, C.: Spatial and temporal pyramids for grammatical expression recognition of american sign language. In: ASSETS, pp. 75–82 (2009)Google Scholar
  35. 35.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)CrossRefGoogle Scholar
  36. 36.
    Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: ECCV, pp. 490–503 (2006)Google Scholar
  37. 37.
    Olonetsky, I., Avidan, S.: Treecann k-d tree coherence approximate nearest neighbor algorithm. In: ECCV, pp. 602–615 (2012)Google Scholar
  38. 38.
    Ortiz, R.: Freak: Fast retina keypoint. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR ‘12, pp. 510–517 (2012)Google Scholar
  39. 39.
    Pati, Y.C., Rezaiifar, R., Rezaiifar, Y.C.P.R., Krishnaprasad, P.S.: Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In: Proceedings of the 27 th Annual Asilomar Conference on Signals, Systems, and Computers, pp. 40–44 (1993)Google Scholar
  40. 40.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR, pp. 1–8 (2007)Google Scholar
  41. 41.
    Philbin, J., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: CVPR, pp. 1–8 (2008)Google Scholar
  42. 42.
    Ramakrishnan, S., Rose, K., Gersho, A.: Constrained-storage vector quantization with a universal codebook. IEEE Trans. Image Process. 7, 42–51 (1995)MathSciNetzbMATHGoogle Scholar
  43. 43.
    Rubinstein, M., Gutierrez, D., Sorkine, O., Shamir, A.: A comparative study of image retargeting. SIGGRAPH 29(5), 160:1–160:10 (2010)Google Scholar
  44. 44.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: An efficient alternative to sift or surf. In: Proceedings of the 2011 International Conference on Computer Vision, ICCV ‘11, pp. 2564–2571 (2011)Google Scholar
  45. 45.
    Schaefer, G.: Compressed domain image retrieval by comparing vector quantisation codebooks. In: Proc. SPIE, pp. 959–966 (2002)Google Scholar
  46. 46.
    Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR, pp. 1–8 (2007)Google Scholar
  47. 47.
    Simakov, D., Caspi, Y., Shechtman, E., Irani, M.: Summarizing visual data using bidirectional similarity. In: CVPR, pp. 1–8 (2008)Google Scholar
  48. 48.
    Stecha, C., Bronstein, A.M., Bronstein, M., Fua, P.: LDAHash: improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 34, 66–78 (2012)Google Scholar
  49. 49.
    Strecha, C.: Dense matching of multiple wide-baseline views. In: ICCV, pp. 1194–1201 (2003)Google Scholar
  50. 50.
    Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. In: CVPR, pp. 1–8 (2008)Google Scholar
  51. 51.
    Trzcinski, T., Christoudias, M., Fua, P., Lepetit, V.: Boosting binary keypoint descriptors. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ‘13), pp. 2874–2881 (2013)Google Scholar
  52. 52.
    Umesh, K., Suresha, S.: Web image retrieval using visual dictionary. Int. J. Web Serv. Comput. 3(3), 77–84 (2012)CrossRefGoogle Scholar
  53. 53.
    Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. (2008)
  54. 54.
    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)Google Scholar
  55. 55.
    Winder, S.A.J., Hua, G., Brown, M.: Picking the best daisy. In: CVPR, pp. 178–185. IEEE, New York (2009)Google Scholar
  56. 56.
    Yao, J., kuen Cham, W.: 3d modeling and rendering from multiple wide-baseline images by match propagation. Sig. Proc.: Image Commu. 21(6), 506–518 (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Technion Israel Institute of TechnologyHaifaIsrael

Personalised recommendations