Skip to main content
Log in

Bag of indexes: a multi-index scheme for efficient approximate nearest neighbor search

  • 1154T: Content-Based Multimedia Indexing in the era of Artificial Intelligence
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

During the last years, the problem of Content-Based Image Retrieval (CBIR) was addressed in many different ways, achieving excellent results in small-scale datasets. With growth of the data to evaluate, new issues need to be considered and new techniques are necessary in order to create an efficient yet accurate system. In particular, computational time and memory occupancy need to be kept as low as possible, whilst the retrieval accuracy has to be preserved as much as possible. For this reason, a brute-force approach is no longer feasible, and an Approximate Nearest Neighbor (ANN) search method is preferable. This paper describes the state-of-the-art ANN methods, with a particular focus on indexing systems, and proposes a new ANN technique called Bag of Indexes (BoI). This new technique is compared with the state of the art on several public benchmarks, obtaining 86.09% of accuracy on Holidays+Flickr1M, 99.20% on SIFT1M and 92.4% on GIST1M. Noteworthy, these state-of-the-art accuracy results are obtained by the proposed approach with a very low retrieval time, making it excellent in the trade off between accuracy and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The C++ code is available on https://www.github.com/fmaglia/BoI

References

  1. Böhm C., Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys (CSUR) 33(3):322–373

    Article  Google Scholar 

  2. Brandenburg FJ, Gleißner A, Hofmeier A (2013) The nearest neighbor spearman footrule distance for bucket, interval, and partial orders. J Combinatorial Optim 26(2):310–332

    Article  MathSciNet  Google Scholar 

  3. Cao Y, Liu B, Long M, Wang J, KLiss M (2018) Hashgan: Deep learning to hash with pair conditional wasserstein GAN. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1287–1296

  4. Chen CC, Hsieh SL (2015) Using binarization and hashing for efficient SIFT matching. J Vis Commun Image Represent 30:86–93

    Article  Google Scholar 

  5. Du S, Zhang W, Chen S, Wen Y (2014) Learning flexible binary code for linear projection based hashing with random forest. In: Proceedings of the 22nd international conference on pattern recognition. IEEE, pp 2685–2690

  6. Ercoli S, Bertini M, Del Bimbo A (2017) Compact hash codes for efficient visual descriptors retrieval in large scale databases. IEEE Transactions on Multimedia 19(11):2521–2532

    Article  Google Scholar 

  7. Esuli A (2012) Use of permutation prefixes for efficient and scalable approximate similarity search. Information Processing & Management 48(5):889–902

    Article  Google Scholar 

  8. Ge T, He K, Ke Q, Sun J (2014) Optimized product quantization. IEEE Trans Pattern Anal Mach Intell 36(4):744–755

    Article  Google Scholar 

  9. Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: European conference on computer vision. Springer, pp 241–257

  10. Greene D, Parnas M, Yao F (1994) Multi-index hashing for information retrieval. In: Proceedings of the 35th Annual symposium on foundations of computer science. IEEE, pp 722–731

  11. Guo D, Li C, Wu L (2016) Parametric and nonparametric residual vector quantization optimizations for ANN search. Neurocomputing 217:92–102

    Article  Google Scholar 

  12. Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 39–43

  13. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on theory of computing. ACM, pp 604–613

  14. Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision. Springer, pp 304–317

  15. Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128

    Article  Google Scholar 

  16. Jin Z, Li C, Lin Y, Cai D (2014) Density sensitive hashing. IEEE Trans Cybern 44(8):1362–1371

    Article  Google Scholar 

  17. Kalantidis Y, Avrithis Y (2014) Locally optimized product quantization for approximate nearest neighbor search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2321–2328

  18. Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision. Springer, pp 685–701

  19. Lin J, Morere O, Petta J, Chandrasekhar V, Veillard A (2016) Tiny descriptors for image retrieval with unsupervised triplet hashing. In: Data Compression Conference (DCC). IEEE, pp 397–406

  20. Lin K, Yang HF, Hsiao JH, Chen CS (2015) Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35

  21. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  22. Lu X, Song L, Xie R, Yang X, Zhang W (2017) Deep hash learning for efficient image retrieval. In: IEEE international conference on multimedia & expo workshops. IEEE, pp 579–584

  23. Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, pp 950–961

  24. Magliani F, Bidgoli NM, Prati A (2017) A location-aware embedding technique for accurate landmark recognition. In: Proceedings of the 11th international conference on distributed smart cameras. ACM, pp 9–14

  25. Magliani F, Fontanini T, Prati A (2018) A dense-depth representation for VLAD descriptors in content-based image retrieval. In: International symposium on visual computing. Springer, pp 662–671

  26. Magliani F, Fontanini T, Prati A (2018) Efficient nearest neighbors search for large-scale landmark recognition. Proceedings of the 13th international symposium on visual computing 11241:541–551

    Google Scholar 

  27. Magliani F, Prati A (2018) An accurate retrieval through R-MAC+ descriptors for landmark recognition. In: Proceedings of the 12th international conference on distributed smart cameras. ACM, p 6

  28. Mohedano E, McGuinness K, O’Connor NE, Salvador A, Marques F, Giro-i Nieto X (2016) Bags of local convolutional features for scalable instance search. In: Proceedings of the international conference on multimedia retrieval. ACM, pp 327–331

  29. Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240

    Article  Google Scholar 

  30. Norouzi M, Fleet DJ (2013) Cartesian k-means. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3017–3024

  31. Norouzi M, Punjani A, Fleet DJ (2012) Fast search in hamming space with multi-index hashing. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 3108–3115

  32. Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  Google Scholar 

  33. Ren G, Cai J, Li S, Yu N, Tian Q (2014) Salable image search with reliable binary code. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 769–772

  34. Wang D, Cui P, Ou M, Zhu W (2015) Learning compact hash codes for multimodal representations using orthogonal deep structure. IEEE Transactions on Multimedia 17(9):1404–1416

    Article  Google Scholar 

  35. Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: A survey. arXiv:1408.2927

  36. Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760

  37. Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: AAAI, vol 1, p 2

  38. Zhou W, Lu Y, Li H, Tian Q (2012) Scalar quantization for large scale image search. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 169–178

  39. Zhou W, Yang M, Li H, Wang X, Lin Y, Tian Q, et al (2014) Towards codebook-free: Scalable cascaded hashing for mobile image search. IEEE Transactions of Multimedia 16(3):601–611

    Article  Google Scholar 

  40. Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: AAAI, pp 2415–2421

Download references

Acknowledgements

This research benefits from the HPC (High Performance Computing) facility of the University of Parma, Italy.

This is work is partially funded by Regione Emilia Romagna under the “Piano triennale alte competenze per la ricerca, il trasferimento tecnologico e l’imprenditorialità”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federico Magliani.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A - LSH projection algorithm

As mentioned in Section 2.3, the hash function used for projecting the vectors is the following:

$$ h(\mathbf{x})=sign(\mathbf{x}\cdot\mathbf{v}) $$
(10)

where x represents the input feature vector and v represents the projecting vector. Before starting the calculation, it is important to generate the vector v for the projection. The values of this vector are sampled from a Gaussian distribution \(\mathcal {N}(0,1)\). The dimensions of this list of vectors depend from the hash dimension (δ), the number of hash tables (L) and the dimension of the input vector. After that, it is possible to calculate the correct bucket using LSH projections. Assuming to have δ = 8 and L = 10, 80 LSH projections will be generated for each image descriptor, but only 10 buckets for each image are obtained for projecting the descriptors (one for each hash table). This is due to the fact that, for each hash table, hash dimension (δ) projections are calculated, so in the end the vectors will projected in only L buckets. Applying more projections for each hash table (thus, increasing the hash dimension δ) allows to improve the robustness of LSH approach and reduce the possibility of projecting different elements in close buckets because increasing the number of bits used (δ) increase the number of possible buckets for the final projection. To summarize, once we have the projection vectors, the dot product between the input vector and the projection vector is computed. If it is greater than zero, the bucket value (index) is increased by a power of two.

B - Memory requirements

The memory requirements of the ANN algorithm depend from the considered number of images. For example, if 1M images are represented by 1M descriptors of 128D (float = 4 bytes) are employed, the brute-force approach requires 0.5Gb (1M x 128 x 4). For the same task, LSH needs only 100 Mb, because it needs to store 1M indexes for each of the L = 100 hash tables, since each index is stored using a single byte (8 bit). In addition, the proposed BoI only requires additional 4 Mb to store 1M weights, allowing to scale better the proposed approach on larger datasets than the brute-force approach.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Magliani, F., Fontanini, T. & Prati, A. Bag of indexes: a multi-index scheme for efficient approximate nearest neighbor search. Multimed Tools Appl 80, 23135–23156 (2021). https://doi.org/10.1007/s11042-020-10262-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10262-4

Keywords

Navigation