Abstract
This paper considers the problem of approximate nearest neighbor search in the compressed domain. We introduce polysemous codes, which offer both the distance estimation quality of product quantization and the efficient comparison of binary codes with Hamming distance. Their design is inspired by algorithms introduced in the 90’s to construct channeloptimized vector quantizers. At search time, this dual interpretation accelerates the search. Most of the indexed vectors are filtered out with Hamming distance, letting only a fraction of the vectors to be ranked with an asymmetric distance estimator. The method is complementary with a coarse partitioning of the feature space such as the inverted multiindex. This is shown by our experiments performed on several public benchmarks such as the BIGANN dataset comprising one billion vectors, for which we report stateoftheart results for query times below 0.3 millisecond per core. Last but not least, our approach allows the approximate computation of the kNN graph associated with the Yahoo Flickr Creative Commons 100M, described by CNN image descriptors, in less than 8 h on a single machine.
Keywords
 Asymmetric Distance Computation
 Hamming Distance
 Product Quantization (PQ)
 Approximate Nearest Neighbor
 Distance Estimation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download conference paper PDF
1 Introduction
Nearest neighbor search, or more generally similarity search, has received a sustained attention from different research communities in the last decades. The computer vision community has been especially active on this subject, which is of utmost importance when dealing with very large visual collections.
While early approximate nearest neighbor (ANN) methods were mainly optimising the tradeoff between speed and accuracy, many recent works [4, 27, 37, 45] put memory requirements as a central criterion for several reasons. For instance, due to the memory hierarchy, using less memory means using faster memory: disks are slower than the main memory, the main memory is slower than CPU caches, etc. Accessing the memory may be the bottleneck of the search. Therefore algorithms using compact codes are likely to offer a better efficiency than those relying on full vectors. For these reasons, we focus on ANN search with compact codes, which are able to make search in vector sets comprising as much as one billion vectors on a single machine.
We distinguish two separate lines of research in ANN with compact codes. The first class of methods proposes to map the original vectors to the Hamming hypercube [33, 37, 49, 51]. The resulting bitvectors are efficiently compared with the Hamming distance thanks to optimized lowlevel processor instructions such as xor and popcnt, available both on CPUs and GPUs. Another increasingly popular approach [5, 18, 27, 39, 55, 56] is to adopt a quantization point of view to achieve a better distance estimation for a given code size. While these two classes of approaches are often seen as contenders, they both have their advantages and drawbacks. Binary codes offer a faster elementary distance computation and do not need external metadata once the codes are produced. In contrast, quantizationbased approaches achieve better memory/accuracy operating points.
The polysemous codes introduced in this paper offer the best of both worlds. They can be compared either with binary codes, which is especially useful in a filtering step, or with the asymmetric distance estimator of product quantization approaches. The key aspect to attain this dual interpretation is the learning procedure. Our approach is inspired by works on channeloptimized vector quantization [16]. We start by training a product quantizer [27]. We then optimize the socalled index assignment of the centroids to binary codes. In other terms, we reorder the numeration of the centroids such that distances between similar centroids are small in the Hamming space, as illustrated in Fig. 1.
As a result, our method is almost on par both with quantizationbased methods in terms of accuracy and binary methods with respect to search efficiency. When combining this approach with a complementary approach such as the inverted multiindex [4], we outperform the state of the art by a large margin, as shown by our experiments carried out on several large public benchmarks. Interestingly, the high efficiency of our approach offers a scalable solution to the allneighbor problem, i.e., to compute the kNN graph, for the large image collection Flickr100M described by 4,096 dimensional vectors.
This paper is organized as follows. After briefly reviewing related works on ANN search with compact codes in Sect. 2, Sect. 3 describes the design of polysemous codes. The experiments analyzing our approach and comparing to the stateoftheart are detailed in Sect. 4. Finally Sect. 5 illustrates our method on the task of constructing an image graph on a large scale.
2 Related Work: ANN with Compact Codes
The literature on efficient search with compact codes is vast and we refer the reader to two recent surveys [48, 50] for extensive references on this subject. In this section, we present only a few popular approaches.
Compact Binary Codes. LocalitySensitive hashing [11, 19, 24] is a pioneering binary encoding technique. Charikar [11] shows under some assumptions that the Hamming distance is statistically related to the cosine similarity (equivalently the Euclidean distance for normalized vectors). Bruteforce comparison of binary hashes has been seen as a viable option for efficient image search with memory constraints [37], which was popularized by subsequent works evidencing the scalability of this approach to millionsized image collections [44]. Additionally, Norouzi and Fleet have proposed an algorithm to speedup the search in this Hamming space [40]. Many variants have been subsequently proposed, such as spectral hashing [51] or ITQ [20] – see also [42, 49, 53] for representative works. Related to our work, the kmeans hashing method [23] first produces a vector quantizer where the produced codes are compared with the Hamming distance.
QuantizationBased Codes. Several works have primarily focused on optimizing the tradeoff between memory and distance estimation. In particular, it is shown that vector quantizers [27] satisfying the Lloyd conditions [22] offer statistical guarantees on the square Euclidean distance estimator, which is bounded in expectation by the quantizer squared loss. These quantizationbased methods include product quantization (PQ) [27] and its optimized versions “optimized product quantization” [18] and “Cartesian kmeans” [39].
These approaches are effective for approximate search within large collections of visual descriptors. Subsequent works [5, 55, 56] have pushed possible memory/efficiency tradeoff by adopting a more general point of view, such as “Additive quantization” [5], which provides an excellent approximation and search performance, yet obtained with a much higher computational encoding cost [7]. In between PQ and this general formulation, good tradeoffs are achieved by residual quantizers [10, 30], which are routinely used in the nonexhaustive PQ variant [27] to reduce the quantization loss by encoding the residual error vector instead of the original vector, but also as a coding strategy on its own [1, 12, 38].
Nonexhaustive Search. The aforementioned methods for ANN search limit the memory usage per indexed vector and provide a distance estimator that is faster to compute than the exact distance. However, the search is still exhaustive in the sense that the query is compared to all database elements. For billionsized collections, reading the codes in memory is a severe constraint leading to search times in the order of a second, typically. The limitation imposed by this memory bottleneck has led to twostage approaches [2, 26, 27, 31], in which the feature space is first partitioned through hashing or clustering. Practically, an inverted list storing identifiers and corresponding compact codes is stored for each region. At query time, the distance is estimated only for the codes associated with a subset of regions [4, 27, 36]. It is also possible to use multiple partitions as in early LSH papers, as done in joint inverted indexing [52]. These solutions however require several indexing structures and are therefore not competitive in terms of memory usage. Various partitioning methods have been proposed for the coarse level [4, 41]. In particular, the inverted multiindex uses product quantization both to define the coarse level and for coding residual vectors. This strategy offers stateoftheart performance when further combined with a reranking strategy based on codes [29].
Motivation: Binary Codes Versus QuantizationBased Approaches. The Hamming distance is significantly faster to evaluate than the distance estimator based on table lookups involved in quantization methods^{Footnote 1}. From our measurements, the acceleration factor is typically between \(4\times \) and \(7\times \), depending on the code length. However, binary methods suffer limitations imposed by the Hamming space. First, the number of possible distances is at most d+1, where d is the binary vector length. This problem is partially solved by asymmetric variants of LSH [15, 21, 25], whose estimations use compact codes for database vectors but not on the query side. Yet such asymmetric measures require lookups, like the methods derived from product quantization, and are therefore more expensive to evaluate than the Hamming distance. On the other hand, quantizationbased methods offer a better memory/accuracy compromise, which is expected since binarization is a particular case of quantization.
Binary and quantizationbased codes have their own advantages and drawbacks. While the literature usually presents binary and quantizedbased codes as concurrent methods, the next section introduces a method that benefits from the advantages of both classes of methods.
3 Polysemous Codes
We, now, introduce our strategy to take advantage of the fast computation of Hamming distances while offering the estimation accuracy of quantizationbased methods. The main idea is to learn a regular product quantizer [27], and then to optimize the assignment of centroid indexes to binary codes such that the Hamming distance approximates the intercentroid distance. In this section, we first describe the objective functions optimized to achieve this property, and then describe the optimization algorithm.
Note that for a product quantizer, one typically optimizes separately each of the constituent subquantizers. Therefore, in what follows, we have one objective function (and optimization process) per subquantizer.
3.1 Objective Functions
We consider two objective functions: one that minimizes a loss based on the distance estimator and one that minimizes a ranking loss.
Notation. A quantizer is usually described by its set of centroids. Let \(\mathcal{I}\) be the set of centroid indexes: \(\mathcal{I}= \{0, 1, \ldots , 2^d1\}\) and \(d=8\) if each (sub)quantizer encodes the original vectors on one byte as is standard practice. Let \(c_i\) be the reproduction value associated with centroid i. Let \(d: \mathbb {R}^D \times \mathbb {R}^D \rightarrow \mathbb {R}^+\) be a distance between centroids, for instance the Euclidean distance. Let \(\pi : \mathcal{I}\rightarrow \{0,1\}^d\) denote a bijective function that maps each centroid index to a different vertex of the unit hypercube. Finally let \(h: \{0,1\}^d \times \{0,1\}^d \rightarrow \mathbb {R}^+\) be the Hamming distance between two ddimensional binary representations.
Distance Estimator Loss. One possible objective if to find the bijective map \(\pi \) such that the distance \(d(c_i,c_j)\) between two centroids is approximated by the Hamming distance \(h(\pi (i), \pi (j))\) between the two corresponding binary codes:
where \(f:\mathbb {R}\rightarrow \mathbb {R}\) is a monotonously increasing function that maps the distance \(d(c_i,c_j)\) between codewords into a range comparable to Hamming distances. In practice, we choose for f a simple linear mapping. This choice is motivated by the following observations. The Hamming distance between two binary vectors randomly drawn from \(\{0,1\}^d\) follows a binomial distribution with mean d / 2 and variance d / 4. Assuming that the distribution of distances \(d(c_i,c_j)\) can be approximated by a Gaussian distribution – which is a good approximation of the binomial – with mean \(\mu \) and standard deviation sigma \(\sigma \), we can force the two distributions to have the same mean and variance. This yields:
where \(\mu \) and \(\sigma \) are measured empirically.
As, in the context of kNN, it is more important to approximate small distances than large ones, we found out in practice that it is beneficial to weight the distances in the objective function (1). This leads to a weighted objective:
We choose a function \(w: \mathbb {R}\rightarrow \mathbb {R}\) of the form \(w(u) = \alpha ^{u}\) with \(\alpha <1\). In our experiments, we set \(\alpha =1/2\) but we found out that values of \(\alpha \) in the range [0.2, 0.6] yielded similar results.
Ranking Loss. In the context of kNN search, we are interested in finding a bijective map \(\pi \) that preserves the ranking of codewords. For this purpose, we adopt an Information Retrieval perspective. Let (i, j) be a pair of codewords such that i is assumed to be a “query” and j is assumed to be “relevant” to i. We will later discuss the choice of \((\text{ query },\text{ relevant })\) pairs. We take as negatives for query i the codewords k such that \(d(c_i,c_j)<d(c_i,c_k)\). The loss for pair (i, j) may be defined as:
where \(\mathbbm {1}[u] = 1\) if u is true, 0 otherwise. It measures how many codewords k are closer to i than j according to the Hamming distance while, i is closer to j than k according to the distance between centroids. We note that the previous loss measures the number of correctly ranked pairs which is closely related to Kendall’s tau coefficient.
An issue with the loss \(r_\pi (i,j)\) is that it gives the same weight to the top of the list as to the bottom. However, in ranking problems it is desirable to give more weight to errors occurring in the top ranks. Therefore, we do not use directly the loss \(r_\pi (i,j)\) for the pair (i, j), but adopt instead a loss that increases sublinearly with \(r_\pi (i,j)\). More specifically, we follow [46] and introduce a monotonously decreasing sequence \(\alpha _i\) as well as the sequence \(\ell _j = \sum _{i=1}^j \alpha _i\), which increases sublinearly with j. We define the weighted loss for pair (i, j) as \(\ell _{r_\pi (i,j)}\).
A subsequent question is how to choose pairs (i, j). One possibility would be to choose j among the kNNs of i, in which case we would optimize
An issue with this approach is that it requires choosing an arbitrary length k for the NN list. An alternative is to consider all \(j \ne i\) as being potentially “relevant” to i but to downweight the contribution of those j’s which are further away from i. In such a case, we optimize
where we recall that \(\alpha _i\) is a decreasing sequence and r(i, j) is the rank of j in the ordered list of neighbors to i:
In all our ranking experiments, we use Eq. (6) and choose \(\alpha _i = 1/i\) (following [46]).
3.2 Optimization
The aforementioned objective functions aim at finding a bijective map \(\pi \), or equivalently another numeration of the set of PQ centroids, that would assign similar binary codes to neighbouring centroids.
This problem is similar to that of channel optimized vector quantization [16, 17, 54], for which researchers have designed quantizers such that the corruption of a bit by the channel impacts the reconstruction as little as possible. This is a discrete optimisation problem that can not be relaxed, and for which we can only target a local minimum, as the set of possible bijective maps is huge. In the coding literature, such index assignment problems were first optimised in a greedy manner, for instance by using the binary switching algorithm [54]. Starting from an initial index assignment, at each iteration, this algorithm tests all possible bit swaps (i.e., d), and keeps the one providing the best update of the objective function. As shown by Farvardin [16], this strategy rapidly gets trapped in a poor local minimum. To our knowledge, the best approach to index assignment problems is to employ simulated annealing to carry out the optimization. This choice was shown to be significantly better [16] than previous greedy approaches. The algorithm aims at optimizing a loss \(L(\pi )\) that depends on the bijective mapping \(\pi \) defined as a table of size \(2^d\). It proceeds as follows
The algorithm depends on the number of iterations \(N_\mathrm {iter}=500,000\), the initial “temperature” \(t_0=0.7\) and \(t_\mathrm {decay} = 0.9^{1/500}\), i.e. decrease by a factor 0.9 every 500 iterations. Evaluating the distance estimation loss (resp ranking loss) has a complexity in \(\mathcal {O}(2^{2d})\) (resp. \(\mathcal {O}(2^{3d})\)). However, computing the cost update incurred by a swap can be implemented in \(\mathcal {O}(2^d)\) (resp. \(\mathcal {O}(2^{2d})\)).
Figure 2 shows on a set of SIFT descriptors that our optimization is effective: the comparison of the codes used as binary vectors is much more correlated with the true distance after than before the optimization.
3.3 Discussion
Although the optimization algorithm is similar to those previously employed in channeloptimized vector quantization, our objective functions are significantly different to reflect our application scenario. In communications, it is unlikely that many bit errors occur simultaneously, in particular not on a memoryless channel. Therefore the objective functions employed in communication focus on small Hamming distances. In contrast, for ANN the typical Hamming distances of the neighbors are relatively large.
We point out that, while the proposed binarized PQ codes offer a competitive performance, their accuracy is significantly lower than that of PQ. This suggests a twostep strategy for largescale search. Given a query, we first filter out the majority of the database items using the fast Hamming distance on the binarized PQ codes. We then evaluate the more costly asymmetric distances for the items whose Hamming distance was below a given threshold \(\tau \).
Other strategies could be contemplated for the filtering stage. One such strategy is to measure how many quantization indexes differ for the product quantizer^{Footnote 2}. In other terms, one can filter out vectors if more than a given number of subquantizers produce indexes not identical to those of the queries. As shown in the experimental Sect. 4, this method is not as efficient nor precise as the strategy proposed in this section.
Another such strategy would be to use for the filtering stage a binary encoding technique unrelated to PQ, e.g., ITQ. The issue is that it would increase the memory requirements of the method as it would involve storing ITQ codes and PQ codes. In constrat, we only store one polysemous code per database item in the proposed approach – a must if the emphasis is on storage requirements.
4 Experiments
This section gives an analysis and evaluates our polysemous codes. After introducing the evaluation protocol, we analyze our core approach in different aspects. Then we show that our approach is compatible with the inverted multiindex (IMI) and give a comparison against the state of the art.
4.1 Evaluation Protocol
We analyze and evaluate our approach with standard benchmarks for ANN as well as a new benchmark that we introduce to evaluate the search quality.

SIFT1M is a benchmark [29] of 128dimensional SIFT descriptors [35]. There are one million vectors in the database, plus 100,000 vectors for training and 10,000 query vectors. This is a relatively small set that we mainly use for parameter analysis.

BIGANN is a largescale benchmark [29] widely used for ANN search, also made of SIFT descriptors. It comprises one billion database vectors, 100 million training vectors and 10,000 queries.

FYCNN1M and FYCNN90M are introduced to evaluate the quality of the search with more challenging features. We leverage the Yahoo Flickr Creative Commons 100M^{Footnote 3} image collection [43] as follows. In FYCNN90M, we split the dataset into three sets: 90M vectors are to be indexed, 10k vectors serve as queries, 5M vectors are used for training. FYCNN1M uses the same training set and queries, but the indexed set is restricted to the first million images for the purpose of analyzing our method. We extract convolutional neural networks features [34] following the guidelines of [8]: we compute the activations of the 7^{th} layer of AlexNet [32]. This yields 4096dimensional image descriptors. Prior to indexing we reduce these descriptors to 256D with PCA and subsequently apply a random rotation [8, 28].
For all datasets, the accuracy is evaluated by recall@R. This metric measures the fraction of the queries for which the true nearest neighbor is returned within the top R results. All reported times are on a single core of a 2.8 GHz machine.
4.2 Analysis of Polysemous Codes Performance
We first analyze the performance of polysemous codes. Let us introduce notations. We first consider three ways of constructing a product quantizer:
 PQ :

is the baseline: we directly use the code produced by the product quantizer, without any optimization of the index assignment;
 PolyD :

refers to a product quantizer whose index assignment is optimized by minimizing the distance estimator loss introduced in Sect. 3.1;
 PolyR :

similarly refers to a PQ optimized with the proposed ranking loss.
Once the codebook and index assignment are learned, we consider the following methods to estimate distances based on polysemous codes:
 ADC :

is the regular comparison based on an asymmetric distance estimator [27];
 binary :

refers to the bitwise comparison with the Hamming distance when the codes are regarded as bitvectors, like for binary codes (e.g., ITQ);
 disidx :

counts how many subquantizers give different codes (see Sect. 3.3);
 dual :

refers to the strategy employing both interpretations of polysemous codes: the Hamming codes are used to filterout the database vectors whose distance to the query is above a threshold \(\tau \). The indexed vectors satisfying this test are compared with the asymmetric distance estimator.
Note: Polysemous codes are primarily PQ codes. Therefore the performance of polysemous codes and regular PQ is identical when the comparison is independent from the index assignment, which is the case for ADC and disidx. For instance the combinations PolyD/ADC, PolyR/ADC and PQ/ADC are equivalent both in efficiency and accuracy.
Table 1 details the performance of the aforementioned PQ constructions. First, note that the accuracy of disidx is low, and that it is also relatively slow due to the lack of a dedicated machine instruction. Second, these results show that our index assignment optimization is very effective for improving the quality of the binary comparison. Without this optimization, the binary comparison is ineffective both to rank results (PQ/binary), and to filter (PQ/dual). The ranking loss PolyR is slightly inferior to PolyD, so we adopt the latter in the following.
Figure 3 confirms the relevance of PolyD/dual. It gives the performance achieved by this method for varying Hamming thresholds \(\tau \), which parametrizes the tradeoff between speed and accuracy. Polysemous codes allow us to make almost no compromise: attaining the quality of PQ/ADC only requires a minor sacrifice in search time compared to binary codes. With threshold \(\tau =54\), 90–95 % of the points are filtered out; for \(\tau =42\) this raises to more than 99.5 %.
Convergence: Figure 4 shows the performance of the binary filtering as a function of the number of iterations. The algorithm typically converges in a few hundred thousand iterations (1 iteration \(=\) 1 test of possible index swaps). For a set of PQ subquantizer with 256 centroids each, this means a few seconds for the distance reconstruction loss PolyR and up to one hour for the ranking loss PolyR.
4.3 Comparison with the State of the Art
As mentioned in the related work Sect. 2, for large datasets the best tradeoffs between accuracy, search time and memory are obtained by hybrid methods [4, 27] that combine a preliminary space partitioning, typically implemented through clustering, with compact codes learned on residual vectors. That is why we combine our polysemous codes with IMI [4]. This method partitions the space with a product quantizer (the “coarse” partitioning level) and uses PQ to encode residual error vectors. The search proceeds by selecting a few inverted lists at the coarse level, and then use the residual PQ codes to estimate distances for the vectors associated with the selected lists. We further optimize the computation of the lookup tables involved in PQ when multiple lists are probed [6], and use an optimized rotation before encoding with PQ [18].
Building upon this method, we learn polysemous codes for the residual PQ, which allows us to introduce an intermediate stage to filter out most of the list items, hence avoiding most of the distance estimation with PQ. Table 2 gives a comparison against stateoftheart algorithms on the BIGANN dataset. We report both the timings reported for concurrent methods and our improved reimplementation of IMI. Note already that our system obtains very competitive results compared to the original IMI. Note, in the case where a single query vector is searched at a time, as opposed to batch mode, the coarse quantization becomes 50 to 60 % more expensive. Therefore, in the following we use \(K=4096^2\) to target more aggressive operating points by reducing the fixed cost of the coarse quantizer. In this case, the results of PolyD/dual gives a clear improvement compared to IMI\(\star \) and the state of the art. In particular, with 16 bytes we are able to achieve a recall@1 = 0.217 in less than 1 ms on one core (0.38 ms in single query mode, 0.64 ms in batch mode). The binary filter divides by about 2\(\times \) the search time, inducing only a small reduction of the Recall@1 score.
We now compare our method on the more challenging FYCNN90M benchmark, for which a single query amounts to searching an image in a collection containing 90 million images. Figure 5 reports the performance achieved by different methods. First observe that the nonexhaustive methods (bottom) are at least 2 orders of magnitude faster than the methods that compare the codes exhaustively (top), like ITQ. The former are able to find similar images in a few seconds. Again, our polysemous strategy IMI+PolyD/dual offers a competitive advantage over its competitor IMI. Our method is approximately \(1.5\times \) faster for a negligible loss in accuracy.
5 Application: LargeScale KNN Image Graph
As an application to our fast indexing scheme, we now consider the problem of building the approximate kNN graph of a very large image collection. For this experiment, we make use of the 95,063,295 images available in the Flickr 100M dataset. As was the case in Sect. 4, we use 4,096D AlexNet features reduced to 256D with PCA. For the graph construction, we simply compute the kNN with \(k=100\) for each image in turn. This takes 7h44 using 20 threads of a CPU server. Note that the collection that we consider is significantly larger than the ones considered in previous works [14, 47] on kNN graph. Moreover, our approach may be complementary with the method proposed by Dong et al. [14].
For visualization purposes, we seek the modes following a random walk technique [13]: we first iteratively compute the stationary distribution of the walk, (i.e. the probability of each node to be visited during the random walk) and then consider as modes each local maximum of the stationary probability in the graph. We find on the order of 3,000 such maxima. Figure 6 depicts a sample of these maxima as well as their closest neighbors. We believe that these results are representative of the typical quality of the found neighbors, except that, for privacy reasons, we do not show the numerous modes corresponding to faces, of which we found many including specialized modes of “pairs of persons”, “clusters of more than two persons” or “baby faces”.
6 Conclusion
In this work, we introduced polysemous codes, i.e., codes that can be interpreted both as binary codes and as product quantization codes. These complementary views are exploited for very largescale indexing in a simple twostage process that first involves filteringout the majority of irrelevant indexes using the fast Hamming distance on binary codes, and then reordering the short list of candidates using the more precise but also slower asymmetric distance on PQ codes. This yields a competitive indexing scheme that combines the best of both worlds: its speed is comparable to that of pure binary techniques and its accuracy matches that of PQ.
Notes
 1.
A recent method [3] reduces the scanning by employing a lowerbounding lookup table stored in SIMD registers, however this method is efficient only on long inverted lists, which makes it suboptimal in the usual setting.
 2.
Formally, this quantity is also called Hamming distance, but measured between vector of indexes and not binary vectors.
 3.
Out of which only 95M are available for download today.
References
Ai, L., Yu, J., Wu, Z., He, Y., Guan, T.: Optimized residual vector quantization for efficient approximate nearest neighbor search. Multimedia Syst. 1–13 (2015)
Andoni, A., Indyk, P., Nguyen, H.L., Razenshteyn, I.: Beyond localitysensitive hashing. In: SODA, pp. 1018–1028 (2014)
André, F., Kermarrec, A.M., le Scouarnec, N.: Cache locality is not enough: highperformance nearest neighbor search with product quantization fast scan. In: Proceedings of the International Conference on Very Large DataBases (2015)
Babenko, A., Lempitsky, V.: The inverted multiindex. In: CVPR, June 2012
Babenko, A., Lempitsky, V.: Additive quantization for extreme vector compression. In: CVPR, June 2014
Babenko, A., Lempitsky, V.: Improving bilayer product quantization for billionscale approximate nearest neighbors in high dimensions. arXiv preprint arXiv:1404.1831 (2014)
Babenko, A., Lempitsky, V.: Tree quantization for largescale similarity search and classification. In: CVPR, June 2015
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 584–599. Springer, Heidelberg (2014)
Balu, R., Furon, T., Jégou, H.: Beyond project and sign for distance estimation with binary codes. In: ICASSP, April 2014
Barnes, C.F., Rizvi, S., Nasrabadi, N.: Advances in residual vector quantization: a review. IEEE Trans. Image Process. 5(2), 226–262 (1996)
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388, May 2002
Chen, Y., Guan, T., Wang, C.: Approximate nearest neighbor search by residual vector quantization. Sensors 10(12), 11259–11273 (2010)
Cho, M., Lee, K.M.: Modeseeking on graphs via random walks. In: CVPR, June 2012
Dong, W., Charikar, M., Li, K.: Efficient knearest neighbor graph construction for generic similarity measures. In: WWW, March 2011
Dong, W., Charikar, M., Li, K.: Asymmetric distance estimation with sketches for similarity search in highdimensional spaces. In: SIGIR, pp. 123–130, July 2008
Farvardin, N.: A study of vector quantization for noisy channels. IEEE Trans. Inform. Theor. 36(5), 799–809 (1990)
Farvardin, N., Vaishampayan, V.: On the performance and complexity of channeloptimized vector quantizers. IEEE Trans. Inform. Theor. 37(1), 155–160 (1991)
Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization for approximate nearest neighbor search. In: CVPR, June 2013
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimension via hashing. In: Proceedings of the International Conference on Very Large DataBases, pp. 518–529 (1999)
Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR, June 2011
Gordo, A., Perronnin, F.: Asymmetric distances for binary embeddings. In: CVPR (2011)
Gray, R.M., Neuhoff, D.L.: Quantization. IEEE Trans. Inform. Theor. 44, 2325–2384 (1998)
He, K., Wen, F., Sun, J.: Kmeans hashing: an affinitypreserving quantization method for learning binary compact codes. In: CVPR (2013)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)
Jain, M., Jégou, H., Gros, P.: Asymmetric hamming embedding. In: ACM Multimedia, October 2011
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. PAMI 33(1), 117–128 (2011)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, June 2010
Jégou, H., Tavenard, R., Douze, M., Amsaleg, L.: Searching in one billion vectors: rerank with source coding. In: ICASSP, May 2011
Juang, B.H., Gray, A.J.: Multiple stage vector quantization for speech coding. In: ICASSP, vol. 7, pp. 597–600. IEEE (1982)
Kalantidis, Y., Avrithis, Y.: Locally optimized product quantization for approximate nearest neighbor search. In: CVPR, June 2014
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, December 2012
Kulis, B., Grauman, K.: Kernelized localitysensitive hashing for scalable image search. In: ICCV, October 2009
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Handwritten digit recognition with a backpropagation network. In: Advances in Neural Information Processing Systems 2, NIPS (1989)
Lowe, D.G.: Distinctive image features from scaleinvariant keypoints. IJCV 60(2), 91–110 (2004)
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multiprobe LSH: efficient indexing for highdimensional similarity search. In: Proceedings of the International Conference on Very Large DataBases, pp. 950–961 (2007)
Lv, Q., Charikar, M., Li, K.: Image similarity search with compact data structures. In: CIKM, pp. 208–217, November 2004
Martinez, J., Hoos, H.H., Little, J.J.: Stacked quantizers for compositional vector compression. arXiv preprint arXiv:1411.2173 (2014)
Norouzi, M., Fleet, D.: Cartesian kmeans. In: CVPR, June 2013
Norouzi, M., Punjani, A., Fleet, D.J.: Fast search in hamming space with multiindex hashing. In: CVPR (2012)
Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: a comparison of hash function types and querying mechanisms. Pattern Recogn. Lett. 31(11), 1348–1358 (2010)
Raginsky, M., Lazebnik, S.: Localitysensitive binary codes from shiftinvariant kernels. In: NIPS (2010)
Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., Li, L.J.: The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817, March 2015
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large database for nonparametric object and scene recognition. IEEE Trans. PAMI 30(11), 1958–1970 (2008)
Torralba, A., Fergus, R., Weiss, Y.: Small codes and large databases for recognition. In: CVPR, June 2008
Usunier, N., Buffoni, D., Gallinari, P.: Ranking with ordered weighted pairwise classification. In: ICML, June 2009
Wang, J., Wang, J., Zeng, G., Tu, Z., Gan, R., Li, S.: Scalable kNN graph construction for visual descriptors. In: CVPR, pp. 1106–1113, June 2012
Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927 (2014)
Wang, J., Kumar, S., Chang, S.F.: Semisupervised hashing for large scale search. IEEE Trans. PAMI 6(12), 1 (2012)
Wang, J., Liu, W., Kumar, S., Chang, S.: Learning to hash for indexing big data  a survey. CoRR abs/1509.05472 (2015). http://arxiv.org/abs/1509.05472
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS, December 2009
Xia, Y., He, K., Wen, F., Sun, J.: Joint inverted indexing. In: ICCV, December 2013
Xu, H., Wang, J., Li, Z., Zeng, G., Li, S., Yu, N.: Complementary hashing for approximate nearest neighbor search. In: ICCV, November 2011
Zeger, K., Gersho, A.: Pseudogray coding. IEEE Trans. Commun. 38(12), 2147–2158 (1990)
Zhang, T., Du, C., Wang, J.: Composite quantization for approximate nearest neighbor search. In: ICML, pp. 838–846, June 2014
Zhang, T., Qi, G.J., Tang, J., Wang, J.: Sparse composite quantization. In: CVPR, June 2015
Acknowledgements
We are very grateful to Armand Joulin and Laurens van de Maaten for providing the Flicrk100M images and their CNN descriptors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Douze, M., Jégou, H., Perronnin, F. (2016). Polysemous Codes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9906. Springer, Cham. https://doi.org/10.1007/9783319464756_48
Download citation
DOI: https://doi.org/10.1007/9783319464756_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783319464749
Online ISBN: 9783319464756
eBook Packages: Computer ScienceComputer Science (R0)