Bag of indexes: a multi-index scheme for efficient approximate nearest neighbor search

Magliani, Federico; Fontanini, Tomaso; Prati, Andrea

doi:10.1007/s11042-020-10262-4

Bag of indexes: a multi-index scheme for efficient approximate nearest neighbor search

1154T: Content-Based Multimedia Indexing in the era of Artificial Intelligence
Published: 15 January 2021

Volume 80, pages 23135–23156, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

249 Accesses
1 Citation
Explore all metrics

Abstract

During the last years, the problem of Content-Based Image Retrieval (CBIR) was addressed in many different ways, achieving excellent results in small-scale datasets. With growth of the data to evaluate, new issues need to be considered and new techniques are necessary in order to create an efficient yet accurate system. In particular, computational time and memory occupancy need to be kept as low as possible, whilst the retrieval accuracy has to be preserved as much as possible. For this reason, a brute-force approach is no longer feasible, and an Approximate Nearest Neighbor (ANN) search method is preferable. This paper describes the state-of-the-art ANN methods, with a particular focus on indexing systems, and proposes a new ANN technique called Bag of Indexes (BoI). This new technique is compared with the state of the art on several public benchmarks, obtaining 86.09% of accuracy on Holidays+Flickr1M, 99.20% on SIFT1M and 92.4% on GIST1M. Noteworthy, these state-of-the-art accuracy results are obtained by the proposed approach with a very low retrieval time, making it excellent in the trade off between accuracy and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multi-index structure based on SIFT and color features for large scale image retrieval

Article 27 July 2016

Zied Elleuch & Kirmene Marzouki

Multi-Bin search: improved large-scale content-based image retrieval

Article 03 July 2014

Abdelrahman Kamel, Youssef B. Mahdy & Khaled F. Hussain

Scene analysis and search using local features and support vector machine for effective content-based image retrieval

Article 13 June 2018

Uzma Sharif, Zahid Mehmood, … Tanzila Saba

Notes

The C++ code is available on https://www.github.com/fmaglia/BoI

References

Böhm C., Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys (CSUR) 33(3):322–373
Article Google Scholar
Brandenburg FJ, Gleißner A, Hofmeier A (2013) The nearest neighbor spearman footrule distance for bucket, interval, and partial orders. J Combinatorial Optim 26(2):310–332
Article MathSciNet Google Scholar
Cao Y, Liu B, Long M, Wang J, KLiss M (2018) Hashgan: Deep learning to hash with pair conditional wasserstein GAN. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1287–1296
Chen CC, Hsieh SL (2015) Using binarization and hashing for efficient SIFT matching. J Vis Commun Image Represent 30:86–93
Article Google Scholar
Du S, Zhang W, Chen S, Wen Y (2014) Learning flexible binary code for linear projection based hashing with random forest. In: Proceedings of the 22nd international conference on pattern recognition. IEEE, pp 2685–2690
Ercoli S, Bertini M, Del Bimbo A (2017) Compact hash codes for efficient visual descriptors retrieval in large scale databases. IEEE Transactions on Multimedia 19(11):2521–2532
Article Google Scholar
Esuli A (2012) Use of permutation prefixes for efficient and scalable approximate similarity search. Information Processing & Management 48(5):889–902
Article Google Scholar
Ge T, He K, Ke Q, Sun J (2014) Optimized product quantization. IEEE Trans Pattern Anal Mach Intell 36(4):744–755
Article Google Scholar
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: European conference on computer vision. Springer, pp 241–257
Greene D, Parnas M, Yao F (1994) Multi-index hashing for information retrieval. In: Proceedings of the 35th Annual symposium on foundations of computer science. IEEE, pp 722–731
Guo D, Li C, Wu L (2016) Parametric and nonparametric residual vector quantization optimizations for ANN search. Neurocomputing 217:92–102
Article Google Scholar
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 39–43
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on theory of computing. ACM, pp 604–613
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision. Springer, pp 304–317
Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Article Google Scholar
Jin Z, Li C, Lin Y, Cai D (2014) Density sensitive hashing. IEEE Trans Cybern 44(8):1362–1371
Article Google Scholar
Kalantidis Y, Avrithis Y (2014) Locally optimized product quantization for approximate nearest neighbor search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2321–2328
Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision. Springer, pp 685–701
Lin J, Morere O, Petta J, Chandrasekhar V, Veillard A (2016) Tiny descriptors for image retrieval with unsupervised triplet hashing. In: Data Compression Conference (DCC). IEEE, pp 397–406
Lin K, Yang HF, Hsiao JH, Chen CS (2015) Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Lu X, Song L, Xie R, Yang X, Zhang W (2017) Deep hash learning for efficient image retrieval. In: IEEE international conference on multimedia & expo workshops. IEEE, pp 579–584
Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, pp 950–961
Magliani F, Bidgoli NM, Prati A (2017) A location-aware embedding technique for accurate landmark recognition. In: Proceedings of the 11th international conference on distributed smart cameras. ACM, pp 9–14
Magliani F, Fontanini T, Prati A (2018) A dense-depth representation for VLAD descriptors in content-based image retrieval. In: International symposium on visual computing. Springer, pp 662–671
Magliani F, Fontanini T, Prati A (2018) Efficient nearest neighbors search for large-scale landmark recognition. Proceedings of the 13th international symposium on visual computing 11241:541–551
Google Scholar
Magliani F, Prati A (2018) An accurate retrieval through R-MAC+ descriptors for landmark recognition. In: Proceedings of the 12th international conference on distributed smart cameras. ACM, p 6
Mohedano E, McGuinness K, O’Connor NE, Salvador A, Marques F, Giro-i Nieto X (2016) Bags of local convolutional features for scalable instance search. In: Proceedings of the international conference on multimedia retrieval. ACM, pp 327–331
Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240
Article Google Scholar
Norouzi M, Fleet DJ (2013) Cartesian k-means. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3017–3024
Norouzi M, Punjani A, Fleet DJ (2012) Fast search in hamming space with multi-index hashing. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 3108–3115
Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article Google Scholar
Ren G, Cai J, Li S, Yu N, Tian Q (2014) Salable image search with reliable binary code. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 769–772
Wang D, Cui P, Ou M, Zhu W (2015) Learning compact hash codes for multimodal representations using orthogonal deep structure. IEEE Transactions on Multimedia 17(9):1404–1416
Article Google Scholar
Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: A survey. arXiv:1408.2927
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: AAAI, vol 1, p 2
Zhou W, Lu Y, Li H, Tian Q (2012) Scalar quantization for large scale image search. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 169–178
Zhou W, Yang M, Li H, Wang X, Lin Y, Tian Q, et al (2014) Towards codebook-free: Scalable cascaded hashing for mobile image search. IEEE Transactions of Multimedia 16(3):601–611
Article Google Scholar
Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: AAAI, pp 2415–2421

Download references

Acknowledgements

This research benefits from the HPC (High Performance Computing) facility of the University of Parma, Italy.

This is work is partially funded by Regione Emilia Romagna under the “Piano triennale alte competenze per la ricerca, il trasferimento tecnologico e l’imprenditorialità”.

Author information

Authors and Affiliations

IMP lab, University of Parma, Parco Area delle Scienze, 181/A, Parma, Italy
Federico Magliani, Tomaso Fontanini & Andrea Prati

Authors

Federico Magliani
View author publications
You can also search for this author in PubMed Google Scholar
Tomaso Fontanini
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Prati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Federico Magliani.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A - LSH projection algorithm

As mentioned in Section 2.3, the hash function used for projecting the vectors is the following:

$$ h(\mathbf{x})=sign(\mathbf{x}\cdot\mathbf{v}) $$

(10)

where x represents the input feature vector and v represents the projecting vector. Before starting the calculation, it is important to generate the vector v for the projection. The values of this vector are sampled from a Gaussian distribution $\mathcal {N}(0,1)$. The dimensions of this list of vectors depend from the hash dimension (δ), the number of hash tables (L) and the dimension of the input vector. After that, it is possible to calculate the correct bucket using LSH projections. Assuming to have δ = 8 and L = 10, 80 LSH projections will be generated for each image descriptor, but only 10 buckets for each image are obtained for projecting the descriptors (one for each hash table). This is due to the fact that, for each hash table, hash dimension (δ) projections are calculated, so in the end the vectors will projected in only L buckets. Applying more projections for each hash table (thus, increasing the hash dimension δ) allows to improve the robustness of LSH approach and reduce the possibility of projecting different elements in close buckets because increasing the number of bits used (δ) increase the number of possible buckets for the final projection. To summarize, once we have the projection vectors, the dot product between the input vector and the projection vector is computed. If it is greater than zero, the bucket value (index) is increased by a power of two.

B - Memory requirements

The memory requirements of the ANN algorithm depend from the considered number of images. For example, if 1M images are represented by 1M descriptors of 128D (float = 4 bytes) are employed, the brute-force approach requires 0.5Gb (1M x 128 x 4). For the same task, LSH needs only 100 Mb, because it needs to store 1M indexes for each of the L = 100 hash tables, since each index is stored using a single byte (8 bit). In addition, the proposed BoI only requires additional 4 Mb to store 1M weights, allowing to scale better the proposed approach on larger datasets than the brute-force approach.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Magliani, F., Fontanini, T. & Prati, A. Bag of indexes: a multi-index scheme for efficient approximate nearest neighbor search. Multimed Tools Appl 80, 23135–23156 (2021). https://doi.org/10.1007/s11042-020-10262-4

Download citation

Received: 27 June 2019
Revised: 02 September 2020
Accepted: 09 December 2020
Published: 15 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-020-10262-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Bag of indexes: a multi-index scheme for efficient approximate nearest neighbor search

Abstract

Access this article

Similar content being viewed by others

Multi-index structure based on SIFT and color features for large scale image retrieval

Multi-Bin search: improved large-scale content-based image retrieval

Scene analysis and search using local features and support vector machine for effective content-based image retrieval

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A - LSH projection algorithm

B - Memory requirements

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bag of indexes: a multi-index scheme for efficient approximate nearest neighbor search

Abstract

Access this article

Similar content being viewed by others

Multi-index structure based on SIFT and color features for large scale image retrieval

Multi-Bin search: improved large-scale content-based image retrieval

Scene analysis and search using local features and support vector machine for effective content-based image retrieval

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A - LSH projection algorithm

B - Memory requirements

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation