Abstract
Computing distances between large sets of SIFT descriptors is a basic step in numerous algorithms in computer vision. When the number of descriptors is large, as is often the case, computing these distances can be extremely time consuming. We propose the SIFTpack: a compact way of storing SIFT descriptors, which enables significantly faster calculations between sets of SIFTs than the current solutions. SIFTpack can be used to represent SIFTs densely extracted from a single image or sparsely from multiple different images. We show that the SIFTpack representation saves both storage space and run time, for both finding nearest neighbors and computing all distances between all descriptors. The usefulness of SIFTpack is demonstrated as an alternative implementation for K-means dictionaries of visual words and for image retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, A., Triggs, B.: Hyperfeatures - multilevel local coding for visual recognition. In: ECCV, pp. 30–43 (2006)
Aharon, M., Elad, M.: Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM J. Imag. Sci. 1(3), 228–247 (2008)
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)
Bagon, S., Boiman, O., Irani, M.: What is a good image segment? A unified approach to segment extraction. In: ECCV, pp. 30–44 (2008)
Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (2001)
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: Patchmatch: a randomized correspondence algorithm for structural image editing. SIGGRAPH 28(3), 24:1–24:11 (2009)
Barnes, C., Shechtman, E., Goldman, D.B., Finkelstein, A.: The generalized patchmatch correspondence algorithm. In: ECCV, pp. 29–43 (2010)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Benoît, L., Mairal, J., Bach, F., Ponce, J.: Sparse image representation with epitomes. In: CVPR, pp. 2913–2920 (2011)
Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., Fua, P.: BRIEF: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2012)
Crow, F.C.: Summed-area tables for texture mapping. SIGGRAPH 18(3), 207–212 (1984)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV, vol. 1, pp. 1–22 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision & Pattern Recognition, vol. 2, pp. 886–893 (2005)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp. 253–262 (2004)
Deselaers, T., Ferrari, V.: Global and efficient self-similarity for object classification and detection. In: CVPR, pp. 1633–1640 (2010)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, pp. 524–531 (2005)
Furuya, T., Ohbuchi, R.: Dense sampling and fast encoding for 3d model retrieval using bag-of-visual features. In: CIVR, pp. 26:1–26:8 (2009)
Gilinsky, A., Zelnik-Manor, L.: Siftpack: a compact representation for efficient sift matching. In: IEEE International Conference on Computer Vision (ICCV). IEEE, New York (2013)
He, K., Sun, J.: Computing nearest-neighbor fields via propagation-assisted kd-trees. In: CVPR, pp. 111–118 (2012)
Janet, B., Reddy, A.: Image index model for retrieval using hausdorff distortion. In: ICCAIE, pp. 85–89 (2010)
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: ECCV, pp. 304–317 (2008)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011)
Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR, pp. 1943–1950 (2010)
Joulin, A., Bach, F., Ponce, J.: Multi-class cosegmentation. In: CVPR, pp. 542–549 (2012)
Ke, Y., Sukthankar, R.: Pca-sift: A more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 506–513 (2004)
Khapli, V.R., Bhalchandra, A.S.: Compressed domain image retrieval using thumbnails of images. In: CICSyN, pp. 392–396 (2009)
Khapli, V.R., Bhalchandra, A.S.: Image retrieval for compressed and uncompressed images. In: ICICS, pp. 1140–1143 (2009)
Korman, S., Avidan, S.: Coherency sensitive hashing. In: ICCV, pp. 1607–1614 (2011)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)
Li, Z., Imai, J., Kaneko, M.: Robust face recognition using block-based bag of words. In: ICPR, pp. 1285–1288 (2010)
Liu, C., Yuen, J., Torralba, A.: Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
Michael, N., Metaxas, D., Neidle, C.: Spatial and temporal pyramids for grammatical expression recognition of american sign language. In: ASSETS, pp. 75–82 (2009)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)
Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: ECCV, pp. 490–503 (2006)
Olonetsky, I., Avidan, S.: Treecann k-d tree coherence approximate nearest neighbor algorithm. In: ECCV, pp. 602–615 (2012)
Ortiz, R.: Freak: Fast retina keypoint. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR ‘12, pp. 510–517 (2012)
Pati, Y.C., Rezaiifar, R., Rezaiifar, Y.C.P.R., Krishnaprasad, P.S.: Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In: Proceedings of the 27 th Annual Asilomar Conference on Signals, Systems, and Computers, pp. 40–44 (1993)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR, pp. 1–8 (2007)
Philbin, J., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: CVPR, pp. 1–8 (2008)
Ramakrishnan, S., Rose, K., Gersho, A.: Constrained-storage vector quantization with a universal codebook. IEEE Trans. Image Process. 7, 42–51 (1995)
Rubinstein, M., Gutierrez, D., Sorkine, O., Shamir, A.: A comparative study of image retargeting. SIGGRAPH 29(5), 160:1–160:10 (2010)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: An efficient alternative to sift or surf. In: Proceedings of the 2011 International Conference on Computer Vision, ICCV ‘11, pp. 2564–2571 (2011)
Schaefer, G.: Compressed domain image retrieval by comparing vector quantisation codebooks. In: Proc. SPIE, pp. 959–966 (2002)
Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR, pp. 1–8 (2007)
Simakov, D., Caspi, Y., Shechtman, E., Irani, M.: Summarizing visual data using bidirectional similarity. In: CVPR, pp. 1–8 (2008)
Stecha, C., Bronstein, A.M., Bronstein, M., Fua, P.: LDAHash: improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 34, 66–78 (2012)
Strecha, C.: Dense matching of multiple wide-baseline views. In: ICCV, pp. 1194–1201 (2003)
Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. In: CVPR, pp. 1–8 (2008)
Trzcinski, T., Christoudias, M., Fua, P., Lepetit, V.: Boosting binary keypoint descriptors. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ‘13), pp. 2874–2881 (2013)
Umesh, K., Suresha, S.: Web image retrieval using visual dictionary. Int. J. Web Serv. Comput. 3(3), 77–84 (2012)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/ (2008)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)
Winder, S.A.J., Hua, G., Brown, M.: Picking the best daisy. In: CVPR, pp. 178–185. IEEE, New York (2009)
Yao, J., kuen Cham, W.: 3d modeling and rendering from multiple wide-baseline images by match propagation. Sig. Proc.: Image Commu. 21(6), 506–518 (2006)
Acknowledgements
This research was supported in part by the Ollendorf Foundation and by the Israel Ministry of Science. We would also like to thank Prof. Michael Elad for useful conversations and good ideas.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Gilinsky, A., Zelnik-Manor, L. (2016). SIFTpack: A Compact Representation for Efficient SIFT Matching. In: Hassner, T., Liu, C. (eds) Dense Image Correspondences for Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-319-23048-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-23048-1_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23047-4
Online ISBN: 978-3-319-23048-1
eBook Packages: EngineeringEngineering (R0)