Polysemous Codes

  • Matthijs DouzeEmail author
  • Hervé Jégou
  • Florent Perronnin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9906)


This paper considers the problem of approximate nearest neighbor search in the compressed domain. We introduce polysemous codes, which offer both the distance estimation quality of product quantization and the efficient comparison of binary codes with Hamming distance. Their design is inspired by algorithms introduced in the 90’s to construct channel-optimized vector quantizers. At search time, this dual interpretation accelerates the search. Most of the indexed vectors are filtered out with Hamming distance, letting only a fraction of the vectors to be ranked with an asymmetric distance estimator. The method is complementary with a coarse partitioning of the feature space such as the inverted multi-index. This is shown by our experiments performed on several public benchmarks such as the BIGANN dataset comprising one billion vectors, for which we report state-of-the-art results for query times below 0.3 millisecond per core. Last but not least, our approach allows the approximate computation of the k-NN graph associated with the Yahoo Flickr Creative Commons 100M, described by CNN image descriptors, in less than 8 h on a single machine.



We are very grateful to Armand Joulin and Laurens van de Maaten for providing the Flicrk100M images and their CNN descriptors.


  1. 1.
    Ai, L., Yu, J., Wu, Z., He, Y., Guan, T.: Optimized residual vector quantization for efficient approximate nearest neighbor search. Multimedia Syst. 1–13 (2015)Google Scholar
  2. 2.
    Andoni, A., Indyk, P., Nguyen, H.L., Razenshteyn, I.: Beyond locality-sensitive hashing. In: SODA, pp. 1018–1028 (2014)Google Scholar
  3. 3.
    André, F., Kermarrec, A.M., le Scouarnec, N.: Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan. In: Proceedings of the International Conference on Very Large DataBases (2015)Google Scholar
  4. 4.
    Babenko, A., Lempitsky, V.: The inverted multi-index. In: CVPR, June 2012Google Scholar
  5. 5.
    Babenko, A., Lempitsky, V.: Additive quantization for extreme vector compression. In: CVPR, June 2014Google Scholar
  6. 6.
    Babenko, A., Lempitsky, V.: Improving bilayer product quantization for billion-scale approximate nearest neighbors in high dimensions. arXiv preprint arXiv:1404.1831 (2014)
  7. 7.
    Babenko, A., Lempitsky, V.: Tree quantization for large-scale similarity search and classification. In: CVPR, June 2015Google Scholar
  8. 8.
    Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 584–599. Springer, Heidelberg (2014)Google Scholar
  9. 9.
    Balu, R., Furon, T., Jégou, H.: Beyond project and sign for distance estimation with binary codes. In: ICASSP, April 2014Google Scholar
  10. 10.
    Barnes, C.F., Rizvi, S., Nasrabadi, N.: Advances in residual vector quantization: a review. IEEE Trans. Image Process. 5(2), 226–262 (1996)CrossRefGoogle Scholar
  11. 11.
    Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388, May 2002Google Scholar
  12. 12.
    Chen, Y., Guan, T., Wang, C.: Approximate nearest neighbor search by residual vector quantization. Sensors 10(12), 11259–11273 (2010)CrossRefGoogle Scholar
  13. 13.
    Cho, M., Lee, K.M.: Mode-seeking on graphs via random walks. In: CVPR, June 2012Google Scholar
  14. 14.
    Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW, March 2011Google Scholar
  15. 15.
    Dong, W., Charikar, M., Li, K.: Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces. In: SIGIR, pp. 123–130, July 2008Google Scholar
  16. 16.
    Farvardin, N.: A study of vector quantization for noisy channels. IEEE Trans. Inform. Theor. 36(5), 799–809 (1990)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Farvardin, N., Vaishampayan, V.: On the performance and complexity of channel-optimized vector quantizers. IEEE Trans. Inform. Theor. 37(1), 155–160 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization for approximate nearest neighbor search. In: CVPR, June 2013Google Scholar
  19. 19.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimension via hashing. In: Proceedings of the International Conference on Very Large DataBases, pp. 518–529 (1999)Google Scholar
  20. 20.
    Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR, June 2011Google Scholar
  21. 21.
    Gordo, A., Perronnin, F.: Asymmetric distances for binary embeddings. In: CVPR (2011)Google Scholar
  22. 22.
    Gray, R.M., Neuhoff, D.L.: Quantization. IEEE Trans. Inform. Theor. 44, 2325–2384 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    He, K., Wen, F., Sun, J.: K-means hashing: an affinity-preserving quantization method for learning binary compact codes. In: CVPR (2013)Google Scholar
  24. 24.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)Google Scholar
  25. 25.
    Jain, M., Jégou, H., Gros, P.: Asymmetric hamming embedding. In: ACM Multimedia, October 2011Google Scholar
  26. 26.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  27. 27.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. PAMI 33(1), 117–128 (2011)CrossRefGoogle Scholar
  28. 28.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, June 2010Google Scholar
  29. 29.
    Jégou, H., Tavenard, R., Douze, M., Amsaleg, L.: Searching in one billion vectors: re-rank with source coding. In: ICASSP, May 2011Google Scholar
  30. 30.
    Juang, B.H., Gray, A.J.: Multiple stage vector quantization for speech coding. In: ICASSP, vol. 7, pp. 597–600. IEEE (1982)Google Scholar
  31. 31.
    Kalantidis, Y., Avrithis, Y.: Locally optimized product quantization for approximate nearest neighbor search. In: CVPR, June 2014Google Scholar
  32. 32.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, December 2012Google Scholar
  33. 33.
    Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: ICCV, October 2009Google Scholar
  34. 34.
    LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems 2, NIPS (1989)Google Scholar
  35. 35.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  36. 36.
    Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the International Conference on Very Large DataBases, pp. 950–961 (2007)Google Scholar
  37. 37.
    Lv, Q., Charikar, M., Li, K.: Image similarity search with compact data structures. In: CIKM, pp. 208–217, November 2004Google Scholar
  38. 38.
    Martinez, J., Hoos, H.H., Little, J.J.: Stacked quantizers for compositional vector compression. arXiv preprint arXiv:1411.2173 (2014)
  39. 39.
    Norouzi, M., Fleet, D.: Cartesian k-means. In: CVPR, June 2013Google Scholar
  40. 40.
    Norouzi, M., Punjani, A., Fleet, D.J.: Fast search in hamming space with multi-index hashing. In: CVPR (2012)Google Scholar
  41. 41.
    Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: a comparison of hash function types and querying mechanisms. Pattern Recogn. Lett. 31(11), 1348–1358 (2010)CrossRefGoogle Scholar
  42. 42.
    Raginsky, M., Lazebnik, S.: Locality-sensitive binary codes from shift-invariant kernels. In: NIPS (2010)Google Scholar
  43. 43.
    Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., Li, L.J.: The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817, March 2015
  44. 44.
    Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large database for non-parametric object and scene recognition. IEEE Trans. PAMI 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
  45. 45.
    Torralba, A., Fergus, R., Weiss, Y.: Small codes and large databases for recognition. In: CVPR, June 2008Google Scholar
  46. 46.
    Usunier, N., Buffoni, D., Gallinari, P.: Ranking with ordered weighted pairwise classification. In: ICML, June 2009Google Scholar
  47. 47.
    Wang, J., Wang, J., Zeng, G., Tu, Z., Gan, R., Li, S.: Scalable k-NN graph construction for visual descriptors. In: CVPR, pp. 1106–1113, June 2012Google Scholar
  48. 48.
    Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927 (2014)
  49. 49.
    Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for large scale search. IEEE Trans. PAMI 6(12), 1 (2012)Google Scholar
  50. 50.
    Wang, J., Liu, W., Kumar, S., Chang, S.: Learning to hash for indexing big data - a survey. CoRR abs/1509.05472 (2015).
  51. 51.
    Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS, December 2009Google Scholar
  52. 52.
    Xia, Y., He, K., Wen, F., Sun, J.: Joint inverted indexing. In: ICCV, December 2013Google Scholar
  53. 53.
    Xu, H., Wang, J., Li, Z., Zeng, G., Li, S., Yu, N.: Complementary hashing for approximate nearest neighbor search. In: ICCV, November 2011Google Scholar
  54. 54.
    Zeger, K., Gersho, A.: Pseudo-gray coding. IEEE Trans. Commun. 38(12), 2147–2158 (1990)CrossRefGoogle Scholar
  55. 55.
    Zhang, T., Du, C., Wang, J.: Composite quantization for approximate nearest neighbor search. In: ICML, pp. 838–846, June 2014Google Scholar
  56. 56.
    Zhang, T., Qi, G.J., Tang, J., Wang, J.: Sparse composite quantization. In: CVPR, June 2015Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Matthijs Douze
    • 1
    Email author
  • Hervé Jégou
    • 1
  • Florent Perronnin
    • 1
  1. 1.Facebook AI ResearchParisFrance

Personalised recommendations