Skip to main content
Log in

Binary Vectors for Fast Distance and Similarity Estimation

  • NEW TOOLS OF CYBERNETICS, INFORMATICS, COMPUTER ENGINEERING, AND SYSTEMS ANALYSIS
  • Published:
Cybernetics and Systems Analysis Aims and scope

Abstract

This review considers methods and algorithms for fast estimation of distance/similarity measures between initial data from vector representations with binary or integer-valued components obtained from initial data that are mainly high-dimensional vectors with different distance measures (angular, Euclidean, and others) and similarity measures (cosine, inner product, and others). Methods without learning that mainly use random projections with the subsequent quantization and also sampling methods are discussed. The obtained vectors can be applied in similarity search, machine learning, and other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. D. A. Rachkovskij, “Real-valued embeddings and sketches for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 52, No. 6, 967–988 (2016).

    Article  MathSciNet  Google Scholar 

  2. M. Deza and E. Deza, Encyclopedia of Distances, Springer, Berlin-Heidelberg (2016).

    Book  MATH  Google Scholar 

  3. M.-J. Lesot, M. Rifqi, and H. Benhadda, “Similarity measures for binary and numerical data: A survey,” Int. J. Knowledge Engineering and Soft Data Paradigms, Vol. 1, No. 1, 63–84 (2009).

    Article  Google Scholar 

  4. S.-S. Choi, S.-H. Cha, and C. C. Tappert, “A survey of binary similarity and distance measures,” J. Systemics, Cybernetics and Informatics, Vol. 8, No. 1, 43–48 (2010).

    Google Scholar 

  5. W. B. Johnson and J. Lindenstrauss, “Extensions of Lipshitz mapping into Hilbert space,” Contemporary Mathematics, Vol. 26, 189–206 (1984).

    Article  MATH  Google Scholar 

  6. P. Indyk and R. Motwani, “Approximate nearest neighbors: Towards removing the curse of dimensionality,” in: Proc. 30th ACM Symp. Theory of Computing (1998), pp. 604–613.

  7. S. S. Vempala, The Random Projection Method, American Math. Soc., Providence, R.I. (2004).

  8. J. Matousek, “On variants of the Johnson–Lindenstrauss lemma,” Random Structures and Algorithms, Vol. 33, No. 2, 142–156 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  9. A. Andoni, R. Krauthgamer, and I. P. Razenshteyn, “Sketching and embedding are equivalent for norms,” in: Proc. STOC’15 (2015), pp. 479–488.

  10. T. Batu, F. Ergun, and C. Sahinalp, “Oblivious string embeddings and edit distance approximations,” in: Proc. SODA‘06 (2006), pp. 792–801.

  11. P. Indyk and A. Naor, “Nearest-neighbor-preserving embeddings,” ACM Trans. Algorithms, Vol. 3, No. 3, Article No. 31 (2007).

  12. M. Goemans and D. Williamson, “Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming,” Journ. ACM, Vol. 42, No. 6, 1115–1145 (1995).

    Article  MathSciNet  MATH  Google Scholar 

  13. M. Charikar, “Similarity estimation techniques from rounding algorithms,” in: Proc. STOC’02, 380–388 (2002).

  14. X. Yi, C. Caramanis, and E. Price, “Binary embedding: Fundamental limits and fast algorithm,” JMLR: W&CP, Vol. 37, 2162–2170 (2015).

    Google Scholar 

  15. G. S. Manku, A. Jain, and A. D. Sarma, “Detecting near-duplicates for web crawling,” in: Proc. WWW’07 (2007), pp. 141–150.

  16. P. Li, T. J. Hastie, and K. W. Church, “Improving random projections using marginal information,” in: Proc. COLT’06 (2006), pp. 635–649.

  17. F. X. Yu, A. Bhaskara, S. Kumar, Y. Gong, and S.-F. Chang, On Binary Embedding Using Circulant Matrices, arXiv:1511.06480 (2015).

  18. D. A. Rachkovskij, I. S. Misuno, and S. V. Slipchenko, “Randomized projective methods for construction of binary sparse vector representations,” Cybernetics and Systems Analysis, Vol. 48, No. 1, 140–150 (2012).

    Article  MATH  Google Scholar 

  19. D. A. Rachkovskij, “Estimation of vectors similarity by their randomized binary projections” Cybernetics and Systems Analysis, Vol. 51, No. 5, 808–818 (2015).

    Article  MATH  Google Scholar 

  20. G. W. Oehlert, “A note on the delta method,” The American Statistician, Vol. 46, No. 1, 27–29 (1992).

    MathSciNet  Google Scholar 

  21. L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk, “Robust 1-Bit compressive sensing via binary stable embeddings of sparse vectors,” IEEE Trans. Inf. Theory, Vol. 59, No. 4, 2082–2102 (2013).

    Article  MathSciNet  Google Scholar 

  22. L. Jacques, “A quantized Johnson–Lindenstrauss lemma: The finding of Buffon’s needle,” IEEE Trans. Inf. Theory, Vol. 61, No. 9, 5012–5027 (2015).

    Article  MathSciNet  Google Scholar 

  23. D. E. Knuth, “Big omicron and big omega and big theta,” ACM Sigact News, Vol. 8, No. 2, 18–24 (1976).

    Article  Google Scholar 

  24. Z. Karnin, Y. Rabani, and A. Shpilka, “Explicit dimension reduction and its applications,” SIAM J. Comput., Vol. 41, No. 1, 219–249 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  25. K. G. Larsen and J. Nelson, Optimality of the Johnson–Lindenstrauss Lemma, arXiv:1609.02094 (2016).

  26. Y. Plan and R. Vershynin, “Dimension reduction by random hyperplane tessellations,” Discrete and Computational Geometry, Vol. 51, No. 2, 438–461 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  27. S. Oymak and B. Recht, Near Optimal Bounds for Binary Embeddings of Arbitrary Sets, arXiv:1512.04433 (2015).

  28. N. Ailon and B. Chazelle, “The Fast Johnson–Lindenstrauss transform and approximate nearest neighbors” SIAM J. Comput., Vol. 39, No. 1, 302–322 (2009).

    Article  MathSciNet  MATH  Google Scholar 

  29. Q. Le, T. Sarlos, and A. J. Smola, “Fastfood - Computing Hilbert space expansions in loglinear time,” JMLR: W&CP, Vol. 28, No. 3, pp. 244–252 (2013).

    Google Scholar 

  30. S. Oymak, Near-Optimal Sample Complexity Bounds for Circulant Binary Embedding, arXiv:1603.03178 (2016).

  31. S.-H. Hsieh, C.-S. Lu, and S.-C. Pei, “Fast binary embedding via circulant downsampled matrix: A dataindependent approach,” in: Proc. ICIP’16 (2016).

  32. A. Choromanska, K. Choromanski, M. Bojarski, T. Jebara, S. Kumar, and Y. LeCun, “Binary embeddings with structured hashed projections,” in: Proc. ICML’16 (2016), pp. 344–353.

  33. S. Dirksen and A. Stollenwerk, Fast Binary Embeddings with Gaussian Circulant Matrices: Improved Bounds, arXiv:1608.06498 (2016).

  34. P. Li, T. J. Hastie, and K. W. Church, “Very sparse random projections,” in: Proc. KDD’06 (2006), pp. 287–296.

  35. D. A. Rachkovskij, “Formation of similarity-reflecting binary vectors with random binary projections,” Cybernetics and Systems Analysis, Vol. 51, No. 2, 313–323 (2015).

    Article  MATH  Google Scholar 

  36. V. Korolev and I. Shevtsova, “An improvement of the Berry-Esseen inequality with applications to Poisson and mixed Poisson random sums,” Scandinavian Actuarial Journal, Vol. 2012, No. 2, 81–105 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  37. Y. Gong, K. Sanjiv, H. A. Rowley, and S. Lazebnik, “Learning binary codes for highdimensional data using bilinear projections,” in: Proc. CVPR’13 (2013), pp. 484–491.

  38. X. Zhang, F. X. Yu, R. Guo, S. Kumar, S. Wang, and S.-F. Chang, “Fast orthogonal projection based on kronecker product,” in: Proc. ICCV’15 (2015), pp. 2929–2937.

  39. P. Indyk and R. Motwani, “Approximate nearest neighbors: Towards removing the curse of dimensionality,” in: Proc. 30th ACM Symp. Theory of Computing (1998), pp. 604–613.

  40. A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high dimensions via hashing,” in: Proc. VLDB’99 (1999), pp. 518–529.

  41. A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions, Communications of the ACM, Vol. 51, No. 1, 117–122 (2008).

    Article  Google Scholar 

  42. A. Andoni, “Nearest neighbor search: The old, new, and the impossible,” PhD thesis, Massachusetts Institute of Technology (2009).

  43. J. Wang, H. T. Shen, J. Song, and J. Ji, Hashing for Similarity Search: A survey, arXiv:1408.2927 (2014).

  44. P. Li, M. Mitzenmacher, and A. Shrivastava, “Coding for random projections,” in: Proc. ICML’14 (2014), pp. 676–684.

  45. S. Shalev-Shwartz, Y. Singer, and N. Srebro, “Pegasos: Primal estimated sub-gradient solver for SVM,” in: Proc. ICML’2007 (2007), pp. 807–814.

  46. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A library for large linear classification,” Journal of Machine Learning Research, Vol. 9, 1871–1874 (2008).

    MATH  Google Scholar 

  47. T. Joachims, T. Finley, and C.-N. J. Yu, “Cutting-plane training of structural SVMs,” Machine Learning, Vol. 77, No. 1, 27–59 (2009).

    Article  MATH  Google Scholar 

  48. T. Martinetz, K. Labusch, and D. Schneegass, “SoftDoubleMaxMinOver: Perceptron-like training of Support Vector Machines,” IEEE Transactions on Neural Networks, Vol. 20, No. 7, 1061–1072 (2009).

    Article  Google Scholar 

  49. L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in: Proc. COMPSTAT’10 (2010), pp. 177–187.

  50. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” in: Proc. SCG’04 (2004), pp. 253–262.

  51. P. Li, M. Mitzenmacher, and A. Shrivastava, 2-Bit Random Projections, Nonlinear Estimators, and Approximate Near Neighbor Search, arXiv:1602.06577 (2016).

  52. D. Gorisse, M. Cord, and F. Precioso, “Locality-sensitive hashing for chi2 distance,” IEEE Ttrans. PAMI, Vol. 34, No. 2, 402–409 (2012).

    Article  Google Scholar 

  53. P. Li, G. Samorodnitsky, and J. Hopcroft, “Sign cauchy projections and chi-square kernel,” in: Proc. NIPS’13, 2571–2579 (2013).

  54. P. Li, Sign Stable Random Projections for Large-Scale Learning, arXiv:1504.07235 (2015).

  55. A. Dasgupta, R. Kumar, and T. Sarlos, “Fast locality sensitive hashing,” in: Proc. SIGKDD’11 (2011), pp. 1073–1081.

  56. L. Pauleve, H. Jegou, and L. Amsaleg, “Locality sensitive hashing: A comparison of hash function types and querying mechanisms,” Pattern Recognit. Lett., Vol. 31, No. 11, 1348–1358 (2010).

    Article  Google Scholar 

  57. P. Li, “0-bit consistent weighted sampling,” in: Proc. KDD’15 (2015), pp. 665-674.

  58. P. Li, A Comparison Study of Nonlinear Kernels, arXiv:1603.06541. (2016).

  59. M. Manasse, F. McSherry, and K. Talwar, “Consistent weighted sampling,” Tech. Rep. MSR-TR-2010-73 (2010).

  60. S. Ioffe, “Improved consistent sampling, weighted minhash and L1 sketching,” in: Proc. ICDM’10 (2010), pp. 246–255.

  61. B. Haeupler, M. Manasse, and K. Talwar, Consistent Weighted Sampling Made Fast, Small, and Easy, arXiv:1410.4266 (2014).

  62. A. Shrivastava, “Simple and efficient weighted minwise hashing,” in: Proc. NIPS’16 (2016).

  63. M. Thorup, “Bottom-k and priority sampling, set similarity and subset sums with minimal independence,” in: Proc. STOC’13 (2013), pp. 371–378.

  64. P. Li, Generalized Min-Max Kernel and Generalized Consistent Weighted Sampling, arXiv:1605.05721 (2016).

  65. P. Li and C.-H. Zhang, Theory of the GMM Kernel, arXiv:1608.00550 (2016).

  66. P. Li, Nystrom Method for Approximating the GMM Kernel, arXiv:1607.03475 (2016).

  67. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, UK (2000).

    Book  MATH  Google Scholar 

  68. I. Steinwart and A. Christmann, Support Vector Machines, Springer, New York (2008).

    MATH  Google Scholar 

  69. T. Hofmann, B. Scholkopf, and A. Smola, “Kernel methods in machine learning,” Annals of Statistics, Vol. 36, No. 3, 1171–1220 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  70. N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler–Lehman graph kernels,” J. of Machine Learning Research, Vol. 2, 2539–2561 (2011).

    MathSciNet  MATH  Google Scholar 

  71. M. M. Luqman, J. Y. Ramel, J. Llados, and T. Brouard, “Fuzzy multilevel graph embedding,” Pattern Recognition, Vol. 46, No. 2, 551–565 (2013).

    Article  MATH  Google Scholar 

  72. L. Livi, A. Rizzi, and A. Sadeghian, “Optimized dissimilarity space embedding for labeled graphs,” Information Sciences, Vol. 266, 47–64 (2014).

    Article  MathSciNet  Google Scholar 

  73. M. Neumann, R. Garnett, C. Bauckhage, and K. Kersting, “Propagation kernels: Efficient graph kernels from propagated information,” Machine Learning, Vol. 102, No. 2, 209–245 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  74. T. Gartner, J. Lloyd, and P. Flach, “Kernels and distances for structured data,” Machine Learning, Vol. 57, No. 3, 205–232 (2004).

    Article  MATH  Google Scholar 

  75. K. Shin and T. Kuboyama, “A generalization of Haussler’s convolution kernel — Mapping kernel and its application to tree kernels,” J. Comput. Sci. Technol., Vol. 25, No. 5, 1040–1054 (2010).

    Article  MathSciNet  Google Scholar 

  76. G. Da San Martino, N. Navarin, and A. Sperduti, “A tree-based kernel for graphs,” in: Proc. ICDM’12 (2012), pp. 975–986.

  77. N. Kriege and P. Mutzel, “Subgraph matching kernels for attributed graphs,” in: Proc. ICML’12 (2012), pp. 1015–1022.

  78. A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in: Proc. NIPS’07 (2007), pp. 1177–1184.

  79. M. Raginsky and S. Lazebnik, “Locality-sensitive binary codes from shift invariant kernels,” in: Proc. NIPS’09 (2009), pp. 1509–1517.

  80. S. Kim and S. Choi, “Bilinear random projections for locality-sensitive binary codes,” in: Proc. CVPR’15 (2015), pp.1338–1346.

  81. B. Kulis and K. Grauman, “Kernelized locality-sensitive hashing,” IEEE Trans. PAMI, Vol. 34, No. 6, 1092–1104 (2012).

    Article  Google Scholar 

  82. K. Jiang, Q. Que, and B. Kulis, “Revisiting kernelized locality-sensitive hashing for improved large-scale image retrieval,” in: Proc. CVPR’15 (2015), pp. 4933–4941.

  83. H. Xia, P. Wu, S. C. Hoi, and R. Jin, “Boosting multi-kernel locality-sensitive hashing for scalable image retrieval,” in: Proc. SIGIR’12 (2012), pp. 55–64.

  84. P. Li, A. Shrivastava, J. L. Moore, and A. C. König, “Hashing algorithms for large-scale learning,” in: Proc. NIPS’11 (2011), pp. 2672–2680.

  85. P. Li and A.C. König, “Theory and applications of b-bit minwise hashing,” Communications of the ACM, Vol. 54, No. 8, 101–109 (2011).

    Article  Google Scholar 

  86. E. Kushilevitz, R. Ostrovsky, and Y. Rabani, “Efficient search for approximate nearest neighbor in high dimensional spaces,” SIAM Journal on Computing, Vol. 30, No. 2, 457–474 (2000).

    Article  MathSciNet  MATH  Google Scholar 

  87. P. Li and K. W. Church, “A sketch algorithm for estimating two-way and multi-way associations,” Computational Linguistics, Vol. 33, No. 3, 305–354 (2007).

    Article  MATH  Google Scholar 

  88. P. Li, K. W. Church, and T. J. Hastie, “One sketch for all: Theory and applications of conditional random sampling,” in: Proc. NIPS’08 (2008), pp. 953–960.

  89. P. Flajolet and G. N. Martin, “Probabilistic counting algorithms for data base applications,” J. Comput. System Sci., Vol. 31, 182–209 (1985).

    Article  MathSciNet  MATH  Google Scholar 

  90. E. Cohen, “Size-estimation framework with applications to transitive closure and reachability,” J. Comput. System Sci., Vol. 55, 441–453 (1997).

    Article  MathSciNet  MATH  Google Scholar 

  91. E. Cohen, “All-distances sketches, revisited: HIP estimators for massive graphs analysis,” in: Proc. PODS’14 (2014), pp. 88-99.

  92. A. Z. Broder, “On the resemblance and containment of documents,” in: Proc. SEQUENCES’97 (1997), pp. 21–29.

  93. A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic clustering of the web,” Computer Networks and ISDN Systems, Vol. 29, Nos. 8–13, 1157–1166 (1997).

    Article  Google Scholar 

  94. A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher, “Min-wise independent permutations,” J. Comput. System Sci., Vol. 60, 327–336 (1998).

    MathSciNet  MATH  Google Scholar 

  95. M. Mitzenmacher, R. Pagh, and N. Pham, “Efficient estimation for high similarities using odd sketches,” in: Proc. WWW’14 (2014), pp. 109–118.

  96. P. Indyk, “A small approximately min-wise independent family of hash functions,” Journal of Algorithms, Vol. 38, No. 1, 84–90 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  97. M. Patrascu and M. Thorup, “On the k-independence required by linear probing and minwise independence,” ACM Trans. Algorithms, Vol. 12, No. 1, 8:1–8:27 (2016).

  98. S. Dahlgaard and M. Thorup, “Approximately minwise independence with twisted tabulation,” in: Proc. SWAT’14 (2014), pp. 134–145.

  99. M. Thorup, “Fast and Powerful Hashing Using Tabulation, arXiv:1505.01523. (2016).

  100. M. Mitzenmacher and S. Vadhan, “Why simple hash functions work: Exploiting the entropy in a data stream,” in: Proc. SODA’08 (2008), pp. 746–755.

  101. P. Li, A. B. Owen, and C.-H. Zhang, “One permutation hashing,” in: Proc. NIPS’12 (2012), pp. 3122–3130.

  102. M. Charikar, K. Chen, and M. Farach-Colton, “Finding frequent items in data streams,” in: Proc. ICALP’02 (2002), pp. 693–703.

  103. P. Flajolet, É. Fusy, O. Gandouet, and F. Meunier, “Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm,” in: Proc. AofA’07 (2007), pp. 127–146.

  104. A. Shrivastava and P. Li, “Densifying one permutation hashing via rotation for fast near neighbor search,” in: Proc. ICML’14 (2014), pp. 557–565.

  105. A. Shrivastava and P. Li, “Improved densification of one permutation hashing,” in: Proc. UAI’14 (2014), pp. 732–741.

  106. S. Dahlgaard, M. B. T. Knudsen, E. Rotenberg, and M. Thorup, “Hashing for statistics over k-partitions,” in: Proc. FOCS’15 (2015), pp. 1292–1310.

  107. D. Valsesia, S. M. Fosson, C. Ravazzi, T. Bianchi, and E. Magli, “SparseHash: Embedding Jaccard coefficient between supports of signals,” in: ICME 2016 Workshops (2016), pp. 1–16.

  108. E. M. Kussul, D. A. Rachkovskij, and T. N. Baidyk, “Associative-projective neural networks: Architecture, implementation, applications,” in: Proc. Neuro-Nimes’91 (1991), pp. 463–476.

  109. D. A. Rachkovskij, E. M. Kussul, and T. N. Baidyk, “Building a world model with structure-sensitive sparse binary distributed representations,” Biologically Inspired Cognitive Architectures, Vol. 3, pp. 64–86 (2013).

    Article  Google Scholar 

  110. D. Kleyko, E. Osipov, and D. A. Rachkovskij, “Modification of holographic graph neuron using sparse distributed representations,” in: Procedia Computer Science, Vol. 88, 39–45 (2016).

  111. A. Kartashov, A. Frolov, A. Goltsev, and R. Folk, “Quality and efficiency of retrieval for Willshaw-like autoassociative networks: III. Willshaw–Potts model,” Network: Computation in Neural Systems, Vol. 8, No. 1, 71–86 (1997).

    Article  MATH  Google Scholar 

  112. A. A. Frolov, D. A. Rachkovskij, and D. Husek, “On information characteristics of Willshaw-like auto-associative memory,” Neural Network World, Vol. 12, No. 2, 141–158 (2002).

    Google Scholar 

  113. A. A. Frolov, D. Husek, and D. A. Rachkovskij, “Time of searching for similar binary vectors in associative memory,” Cybernetics and Systems Analysis, Vol. 42, No. 5, 615–623 (2006).

    Article  MATH  Google Scholar 

  114. K. Eshghi and M. Kafai, “Support Vector Machines with sparse binary high-dimensional feature vectors,” HPE-2016-30 (2016).

  115. N. M. Amosov, T. N. Baidyk, A. D. Goltsev, A. M. Kasatkin, L. M. Kasatkina, E. M. Kussul, and D. A. Rachkovskij, Neurocomputers and Intelligent Robots [in Russian], Naukova Dumka, Kyiv (1991).

    Google Scholar 

  116. E. M. Kussul, D. A. Rachkovskij, and T. N. Baidyk, “On image texture recognition by an associative-projective neurocomputer,” in: Proc. ANNIE’91 (1991), pp. 453-458.

  117. R. Donaldson, A. Gupta, Y. Plan, and T. Reimer, Random Mappings Designed for Commercial Search Engines, arXiv:1507.05929 (2015).

  118. B. A. Olshausen and D. J. Field, “Sparse coding of sensory inputs,” Curr. Opin. Neurobiol., Vol. 14, 481–487 (2004).

    Article  Google Scholar 

  119. S. Ahmad and J. Hawkins, How Do Neurons Operate on Sparse Distributed Representations? A Mathematical Theory of Sparsity, Neurons and Active Dendrites, arXiv:1601.00720 (2016).

  120. I. S. Misuno, D. A. Rachkovskij, and S. V. Slipchenko, “Vector and distributed representations reflecting semantic relatedness of words,” Mathematical Machines and Systems, No. 3, 50–67 (2005).

  121. I. S. Misuno, D. A. Rachkovskij, S. V. Slipchenko, and A. M. Sokolov, “Searching for text information with the help of vector representations,” Problems of Programming, No. 4, 50–59 (2005).

  122. Q. Shi, J. Petterson, G. Dror, J. Langford, A. J. Smola, and S. V. N. Vishwanathan, “Hash kernels for structured data,” J. Mach. Learn. Res., Vol. 10, 2615–2637 (2009).

    MathSciNet  MATH  Google Scholar 

  123. D. A. Rachkovskij, S. V. Slipchenko, E. M. Kussul, and T. N. Baidyk, “Sparse binary distributed encoding of scalars,” Journal of Automation and Information Sciences, Vol. 37, No. 6, 12–23 (2005).

    Article  Google Scholar 

  124. D. A. Rachkovskij, S. V. Slipchenko, I. S. Misuno, E. M. Kussul, and T. N. Baidyk, “Sparse binary distributedencoding of numeric vectors,” Journal of Automation and Information Sciences, Vol. 37, No. 11, 47–61 (2005).

    Article  Google Scholar 

  125. D. A. Rachkovskij, S. V. Slipchenko, E. M. Kussul, and T. N. Baidyk, “A binding procedure for distributed binary data representations,” Cybernetics and Systems Analysis, Vol. 41, No. 3, 319–331 (2005).

    Article  MathSciNet  MATH  Google Scholar 

  126. E. M. Kussul, D. A. Rachkovskij, and D. C. Wunsch, “The random subspace coarse coding scheme for real-valued vectors,” in: Proc. IJCNN’99 (1999), pp. 450–455.

  127. D. A. Rachkovskij, S. V. Slipchenko, E. M. Kussul, and T. N. Baidyk, “Properties of numeric codes for the scheme of random subspaces RSC,” Cybernetics and Systems Analysis, Vol. 41, No. 4, 509–520 (2005).

    Article  MathSciNet  MATH  Google Scholar 

  128. K. Eshghi and M. Kafai, “The CRO Kernel: Using concomitant rank order hashes for sparse high dimensional randomized feature maps,” in: Proc. ICDE’16 (2016), pp. 721–730.

  129. K. Forbus, R. Ferguson, A. Lovett, and D. Gentner, “Extending SME to handle large-scale cognitive modeling,” DOI: 10.1111/cogs.12377 (2016).

  130. D. A. Rachkovskij and S. V. Slipchenko, “Similarity-based retrieval with structure-sensitive sparse binary distributed representations,” Computational Intelligence, Vol. 28, No. 1, 106–129 (2012).

    Article  MathSciNet  Google Scholar 

  131. D. A. Rachkovskij, “Some approaches to analogical mapping with structure sensitive distributed representations,” J. Experimental and Theoretical Artificial Intelligence, Vol. 16, No. 3, 125–145 (2004).

    Article  MATH  Google Scholar 

  132. S. V. Slipchenko and D. A. Rachkovskij, “Analogical mapping using similarity of binary distributed representations,” Int. J. Information Theories and Applications, Vol. 16, No. 3, 269–290 (2009).

    Google Scholar 

  133. L. Jacques, Small Width, Low Distortions: Quasi-Isometric Embeddings with Quantized Sub-Gaussian Random Projections, arXiv:1504.06170 (2015).

  134. L. Jacques and V. Cambareri, Time for Dithering: Fast and Quantized Random Embeddings via the Restricted Isometry Property, arXiv:1607.00816 (2016).

  135. P. T. Boufounos, H. Mansour, S. Rane, and A. Vetro, “Dimensionality reduction of visual features for efficient retrieval and classification,” APSIPA Trans. on Signal and Information Processing, Vol. 5, No. e14, 1–14 (2016).

    Google Scholar 

  136. P. T. Boufounos, S. Rane, and H. Mansour, Representation and Coding of Signal Geometry, arXiv:1512.07636 (2015).

  137. Q. Lv, M. Charikar, and K. Li, “Image similarity search with compact data structures,” in: Proc. CIKM’04 (2004), pp. 208–217.

  138. Z. Wang, W. Dong, W. Josephson, Q. Lv, M. Charikar, and K. Li, “Sizing sketches: Rank-based analysis for similarity search,” in: Proc. SIGMETRICS’07 (2007), pp. 157–168.

  139. W. Dong, M. Charikar, and K. Li, “Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces,” in: Proc. SIGIR’08 (2008), pp. 123–130.

  140. K. Min, L. Yang, J. Wright, L. Wu, X.-S. Hua, and Y. Ma, “Compact projection: Simple and efficient near neighbor search with practical memory requirements,” in: Proc. CVPR’10 (2010), pp. 3477–3484.

  141. E. Chávez, G. Navarro, R. Baeza-Yates, and J. L Marroquín, “Searching in metric spaces,” ACM Computing Surveys, Vol. 33, No. 3, 273–321 (2001).

    Article  Google Scholar 

  142. P. Zezula, G. Amato, V. Dohnal, and M. Batko, Similarity Search: The Metric Space Approach, Springer, New York (2006).

    MATH  Google Scholar 

  143. G. R. Hjaltason and H. Samet, “Index-driven similarity search in metric spaces,” ACM Transactions on Database Systems, Vol. 28, No. 4, 517–580 (2003).

    Article  Google Scholar 

  144. A. Becker, L. Ducas, N. Gama, and T. Laarhoven, “New directions in nearest neighbor searching with applications to lattice sieving,” in: Proc. SODA’16 (2016), pp. 10–24.

  145. M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,” IEEE Trans. on PAMI, Vol. 36, No. 11, 2227–2240 (2014).

    Article  Google Scholar 

  146. X. Zhang, J. Qin, W. Wang, Y. Sun, and J. Lu, “Hmsearch: An efficient hamming distance query processing algorithm,” in: Proc. SSDBM’13 (2013), pp. 19:1–19:12.

  147. M. Norouzi, A. Punjani, and D. J. Fleet, “Fast exact search in Hamming space with multi-index hashing,” IEEE Trans. PAMI, Vol. 36, No. 6, 1107–1119 (2014).

    Article  Google Scholar 

  148. J. Song, H. T. Shen, J. Wang, Z. Huang, N. Sebe, and J. Wang, “A distance-computation-free search scheme for binary code databases,” IEEE Trans. Multimedia, Vol. 18, No. 3, 484–495 (2016).

    Article  Google Scholar 

  149. N. Pham and R. Pagh, “Scalability and total recall with fast CoveringLSH,” in: Proc. CIKM’16 (2016).

  150. Z. Jiang, L. Xie, X. Deng, W. Xu, and J. Wang, “Fast nearest neighbor search in the hamming space,” in: Proc. MMM’16 (2016), pp. 325–336.

  151. J. Wang, W. Liu, S. Kumar, and S.-F. Chang, “Learning to hash for indexing big data: A survey,” in: Proc. of the IEEE, Vol. 104, No. 1, 34–57 (2016).

  152. J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen, A Survey on Learning to Hash, arXiv:1606.00185 (2016).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. A. Rachkovskij.

Additional information

Translated from Kibernetika i Sistemnyi Analiz, No. 1, January–February, 2017, pp. 160–183.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rachkovskij, D.A. Binary Vectors for Fast Distance and Similarity Estimation. Cybern Syst Anal 53, 138–156 (2017). https://doi.org/10.1007/s10559-017-9914-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10559-017-9914-x

Keywords

Navigation