Binary Vectors for Fast Distance and Similarity Estimation

Rachkovskij, D. A.

doi:10.1007/s10559-017-9914-x

Binary Vectors for Fast Distance and Similarity Estimation

NEW TOOLS OF CYBERNETICS, INFORMATICS, COMPUTER ENGINEERING, AND SYSTEMS ANALYSIS
Published: 26 January 2017

Volume 53, pages 138–156, (2017)
Cite this article

Cybernetics and Systems Analysis Aims and scope

D. A. Rachkovskij¹

306 Accesses
16 Citations
Explore all metrics

Abstract

This review considers methods and algorithms for fast estimation of distance/similarity measures between initial data from vector representations with binary or integer-valued components obtained from initial data that are mainly high-dimensional vectors with different distance measures (angular, Euclidean, and others) and similarity measures (cosine, inner product, and others). Methods without learning that mainly use random projections with the subsequent quantization and also sampling methods are discussed. The obtained vectors can be applied in similarity search, machine learning, and other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

D. A. Rachkovskij, “Real-valued embeddings and sketches for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 52, No. 6, 967–988 (2016).
Article MathSciNet Google Scholar
M. Deza and E. Deza, Encyclopedia of Distances, Springer, Berlin-Heidelberg (2016).
Book MATH Google Scholar
M.-J. Lesot, M. Rifqi, and H. Benhadda, “Similarity measures for binary and numerical data: A survey,” Int. J. Knowledge Engineering and Soft Data Paradigms, Vol. 1, No. 1, 63–84 (2009).
Article Google Scholar
S.-S. Choi, S.-H. Cha, and C. C. Tappert, “A survey of binary similarity and distance measures,” J. Systemics, Cybernetics and Informatics, Vol. 8, No. 1, 43–48 (2010).
Google Scholar
W. B. Johnson and J. Lindenstrauss, “Extensions of Lipshitz mapping into Hilbert space,” Contemporary Mathematics, Vol. 26, 189–206 (1984).
Article MATH Google Scholar
P. Indyk and R. Motwani, “Approximate nearest neighbors: Towards removing the curse of dimensionality,” in: Proc. 30th ACM Symp. Theory of Computing (1998), pp. 604–613.
S. S. Vempala, The Random Projection Method, American Math. Soc., Providence, R.I. (2004).
J. Matousek, “On variants of the Johnson–Lindenstrauss lemma,” Random Structures and Algorithms, Vol. 33, No. 2, 142–156 (2008).
Article MathSciNet MATH Google Scholar
A. Andoni, R. Krauthgamer, and I. P. Razenshteyn, “Sketching and embedding are equivalent for norms,” in: Proc. STOC’15 (2015), pp. 479–488.
T. Batu, F. Ergun, and C. Sahinalp, “Oblivious string embeddings and edit distance approximations,” in: Proc. SODA‘06 (2006), pp. 792–801.
P. Indyk and A. Naor, “Nearest-neighbor-preserving embeddings,” ACM Trans. Algorithms, Vol. 3, No. 3, Article No. 31 (2007).
M. Goemans and D. Williamson, “Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming,” Journ. ACM, Vol. 42, No. 6, 1115–1145 (1995).
Article MathSciNet MATH Google Scholar
M. Charikar, “Similarity estimation techniques from rounding algorithms,” in: Proc. STOC’02, 380–388 (2002).
X. Yi, C. Caramanis, and E. Price, “Binary embedding: Fundamental limits and fast algorithm,” JMLR: W&CP, Vol. 37, 2162–2170 (2015).
Google Scholar
G. S. Manku, A. Jain, and A. D. Sarma, “Detecting near-duplicates for web crawling,” in: Proc. WWW’07 (2007), pp. 141–150.
P. Li, T. J. Hastie, and K. W. Church, “Improving random projections using marginal information,” in: Proc. COLT’06 (2006), pp. 635–649.
F. X. Yu, A. Bhaskara, S. Kumar, Y. Gong, and S.-F. Chang, On Binary Embedding Using Circulant Matrices, arXiv:1511.06480 (2015).
D. A. Rachkovskij, I. S. Misuno, and S. V. Slipchenko, “Randomized projective methods for construction of binary sparse vector representations,” Cybernetics and Systems Analysis, Vol. 48, No. 1, 140–150 (2012).
Article MATH Google Scholar
D. A. Rachkovskij, “Estimation of vectors similarity by their randomized binary projections” Cybernetics and Systems Analysis, Vol. 51, No. 5, 808–818 (2015).
Article MATH Google Scholar
G. W. Oehlert, “A note on the delta method,” The American Statistician, Vol. 46, No. 1, 27–29 (1992).
MathSciNet Google Scholar
L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk, “Robust 1-Bit compressive sensing via binary stable embeddings of sparse vectors,” IEEE Trans. Inf. Theory, Vol. 59, No. 4, 2082–2102 (2013).
Article MathSciNet Google Scholar
L. Jacques, “A quantized Johnson–Lindenstrauss lemma: The finding of Buffon’s needle,” IEEE Trans. Inf. Theory, Vol. 61, No. 9, 5012–5027 (2015).
Article MathSciNet Google Scholar
D. E. Knuth, “Big omicron and big omega and big theta,” ACM Sigact News, Vol. 8, No. 2, 18–24 (1976).
Article Google Scholar
Z. Karnin, Y. Rabani, and A. Shpilka, “Explicit dimension reduction and its applications,” SIAM J. Comput., Vol. 41, No. 1, 219–249 (2012).
Article MathSciNet MATH Google Scholar
K. G. Larsen and J. Nelson, Optimality of the Johnson–Lindenstrauss Lemma, arXiv:1609.02094 (2016).
Y. Plan and R. Vershynin, “Dimension reduction by random hyperplane tessellations,” Discrete and Computational Geometry, Vol. 51, No. 2, 438–461 (2014).
Article MathSciNet MATH Google Scholar
S. Oymak and B. Recht, Near Optimal Bounds for Binary Embeddings of Arbitrary Sets, arXiv:1512.04433 (2015).
N. Ailon and B. Chazelle, “The Fast Johnson–Lindenstrauss transform and approximate nearest neighbors” SIAM J. Comput., Vol. 39, No. 1, 302–322 (2009).
Article MathSciNet MATH Google Scholar
Q. Le, T. Sarlos, and A. J. Smola, “Fastfood - Computing Hilbert space expansions in loglinear time,” JMLR: W&CP, Vol. 28, No. 3, pp. 244–252 (2013).
Google Scholar
S. Oymak, Near-Optimal Sample Complexity Bounds for Circulant Binary Embedding, arXiv:1603.03178 (2016).
S.-H. Hsieh, C.-S. Lu, and S.-C. Pei, “Fast binary embedding via circulant downsampled matrix: A dataindependent approach,” in: Proc. ICIP’16 (2016).
A. Choromanska, K. Choromanski, M. Bojarski, T. Jebara, S. Kumar, and Y. LeCun, “Binary embeddings with structured hashed projections,” in: Proc. ICML’16 (2016), pp. 344–353.
S. Dirksen and A. Stollenwerk, Fast Binary Embeddings with Gaussian Circulant Matrices: Improved Bounds, arXiv:1608.06498 (2016).
P. Li, T. J. Hastie, and K. W. Church, “Very sparse random projections,” in: Proc. KDD’06 (2006), pp. 287–296.
D. A. Rachkovskij, “Formation of similarity-reflecting binary vectors with random binary projections,” Cybernetics and Systems Analysis, Vol. 51, No. 2, 313–323 (2015).
Article MATH Google Scholar
V. Korolev and I. Shevtsova, “An improvement of the Berry-Esseen inequality with applications to Poisson and mixed Poisson random sums,” Scandinavian Actuarial Journal, Vol. 2012, No. 2, 81–105 (2012).
Article MathSciNet MATH Google Scholar
Y. Gong, K. Sanjiv, H. A. Rowley, and S. Lazebnik, “Learning binary codes for highdimensional data using bilinear projections,” in: Proc. CVPR’13 (2013), pp. 484–491.
X. Zhang, F. X. Yu, R. Guo, S. Kumar, S. Wang, and S.-F. Chang, “Fast orthogonal projection based on kronecker product,” in: Proc. ICCV’15 (2015), pp. 2929–2937.
P. Indyk and R. Motwani, “Approximate nearest neighbors: Towards removing the curse of dimensionality,” in: Proc. 30th ACM Symp. Theory of Computing (1998), pp. 604–613.
A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high dimensions via hashing,” in: Proc. VLDB’99 (1999), pp. 518–529.
A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions, Communications of the ACM, Vol. 51, No. 1, 117–122 (2008).
Article Google Scholar
A. Andoni, “Nearest neighbor search: The old, new, and the impossible,” PhD thesis, Massachusetts Institute of Technology (2009).
J. Wang, H. T. Shen, J. Song, and J. Ji, Hashing for Similarity Search: A survey, arXiv:1408.2927 (2014).
P. Li, M. Mitzenmacher, and A. Shrivastava, “Coding for random projections,” in: Proc. ICML’14 (2014), pp. 676–684.
S. Shalev-Shwartz, Y. Singer, and N. Srebro, “Pegasos: Primal estimated sub-gradient solver for SVM,” in: Proc. ICML’2007 (2007), pp. 807–814.
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A library for large linear classification,” Journal of Machine Learning Research, Vol. 9, 1871–1874 (2008).
MATH Google Scholar
T. Joachims, T. Finley, and C.-N. J. Yu, “Cutting-plane training of structural SVMs,” Machine Learning, Vol. 77, No. 1, 27–59 (2009).
Article MATH Google Scholar
T. Martinetz, K. Labusch, and D. Schneegass, “SoftDoubleMaxMinOver: Perceptron-like training of Support Vector Machines,” IEEE Transactions on Neural Networks, Vol. 20, No. 7, 1061–1072 (2009).
Article Google Scholar
L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in: Proc. COMPSTAT’10 (2010), pp. 177–187.
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” in: Proc. SCG’04 (2004), pp. 253–262.
P. Li, M. Mitzenmacher, and A. Shrivastava, 2-Bit Random Projections, Nonlinear Estimators, and Approximate Near Neighbor Search, arXiv:1602.06577 (2016).
D. Gorisse, M. Cord, and F. Precioso, “Locality-sensitive hashing for chi2 distance,” IEEE Ttrans. PAMI, Vol. 34, No. 2, 402–409 (2012).
Article Google Scholar
P. Li, G. Samorodnitsky, and J. Hopcroft, “Sign cauchy projections and chi-square kernel,” in: Proc. NIPS’13, 2571–2579 (2013).
P. Li, Sign Stable Random Projections for Large-Scale Learning, arXiv:1504.07235 (2015).
A. Dasgupta, R. Kumar, and T. Sarlos, “Fast locality sensitive hashing,” in: Proc. SIGKDD’11 (2011), pp. 1073–1081.
L. Pauleve, H. Jegou, and L. Amsaleg, “Locality sensitive hashing: A comparison of hash function types and querying mechanisms,” Pattern Recognit. Lett., Vol. 31, No. 11, 1348–1358 (2010).
Article Google Scholar
P. Li, “0-bit consistent weighted sampling,” in: Proc. KDD’15 (2015), pp. 665-674.
P. Li, A Comparison Study of Nonlinear Kernels, arXiv:1603.06541. (2016).
M. Manasse, F. McSherry, and K. Talwar, “Consistent weighted sampling,” Tech. Rep. MSR-TR-2010-73 (2010).
S. Ioffe, “Improved consistent sampling, weighted minhash and L1 sketching,” in: Proc. ICDM’10 (2010), pp. 246–255.
B. Haeupler, M. Manasse, and K. Talwar, Consistent Weighted Sampling Made Fast, Small, and Easy, arXiv:1410.4266 (2014).
A. Shrivastava, “Simple and efficient weighted minwise hashing,” in: Proc. NIPS’16 (2016).
M. Thorup, “Bottom-k and priority sampling, set similarity and subset sums with minimal independence,” in: Proc. STOC’13 (2013), pp. 371–378.
P. Li, Generalized Min-Max Kernel and Generalized Consistent Weighted Sampling, arXiv:1605.05721 (2016).
P. Li and C.-H. Zhang, Theory of the GMM Kernel, arXiv:1608.00550 (2016).
P. Li, Nystrom Method for Approximating the GMM Kernel, arXiv:1607.03475 (2016).
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, UK (2000).
Book MATH Google Scholar
I. Steinwart and A. Christmann, Support Vector Machines, Springer, New York (2008).
MATH Google Scholar
T. Hofmann, B. Scholkopf, and A. Smola, “Kernel methods in machine learning,” Annals of Statistics, Vol. 36, No. 3, 1171–1220 (2008).
Article MathSciNet MATH Google Scholar
N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler–Lehman graph kernels,” J. of Machine Learning Research, Vol. 2, 2539–2561 (2011).
MathSciNet MATH Google Scholar
M. M. Luqman, J. Y. Ramel, J. Llados, and T. Brouard, “Fuzzy multilevel graph embedding,” Pattern Recognition, Vol. 46, No. 2, 551–565 (2013).
Article MATH Google Scholar
L. Livi, A. Rizzi, and A. Sadeghian, “Optimized dissimilarity space embedding for labeled graphs,” Information Sciences, Vol. 266, 47–64 (2014).
Article MathSciNet Google Scholar
M. Neumann, R. Garnett, C. Bauckhage, and K. Kersting, “Propagation kernels: Efficient graph kernels from propagated information,” Machine Learning, Vol. 102, No. 2, 209–245 (2016).
Article MathSciNet MATH Google Scholar
T. Gartner, J. Lloyd, and P. Flach, “Kernels and distances for structured data,” Machine Learning, Vol. 57, No. 3, 205–232 (2004).
Article MATH Google Scholar
K. Shin and T. Kuboyama, “A generalization of Haussler’s convolution kernel — Mapping kernel and its application to tree kernels,” J. Comput. Sci. Technol., Vol. 25, No. 5, 1040–1054 (2010).
Article MathSciNet Google Scholar
G. Da San Martino, N. Navarin, and A. Sperduti, “A tree-based kernel for graphs,” in: Proc. ICDM’12 (2012), pp. 975–986.
N. Kriege and P. Mutzel, “Subgraph matching kernels for attributed graphs,” in: Proc. ICML’12 (2012), pp. 1015–1022.
A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in: Proc. NIPS’07 (2007), pp. 1177–1184.
M. Raginsky and S. Lazebnik, “Locality-sensitive binary codes from shift invariant kernels,” in: Proc. NIPS’09 (2009), pp. 1509–1517.
S. Kim and S. Choi, “Bilinear random projections for locality-sensitive binary codes,” in: Proc. CVPR’15 (2015), pp.1338–1346.
B. Kulis and K. Grauman, “Kernelized locality-sensitive hashing,” IEEE Trans. PAMI, Vol. 34, No. 6, 1092–1104 (2012).
Article Google Scholar
K. Jiang, Q. Que, and B. Kulis, “Revisiting kernelized locality-sensitive hashing for improved large-scale image retrieval,” in: Proc. CVPR’15 (2015), pp. 4933–4941.
H. Xia, P. Wu, S. C. Hoi, and R. Jin, “Boosting multi-kernel locality-sensitive hashing for scalable image retrieval,” in: Proc. SIGIR’12 (2012), pp. 55–64.
P. Li, A. Shrivastava, J. L. Moore, and A. C. König, “Hashing algorithms for large-scale learning,” in: Proc. NIPS’11 (2011), pp. 2672–2680.
P. Li and A.C. König, “Theory and applications of b-bit minwise hashing,” Communications of the ACM, Vol. 54, No. 8, 101–109 (2011).
Article Google Scholar
E. Kushilevitz, R. Ostrovsky, and Y. Rabani, “Efficient search for approximate nearest neighbor in high dimensional spaces,” SIAM Journal on Computing, Vol. 30, No. 2, 457–474 (2000).
Article MathSciNet MATH Google Scholar
P. Li and K. W. Church, “A sketch algorithm for estimating two-way and multi-way associations,” Computational Linguistics, Vol. 33, No. 3, 305–354 (2007).
Article MATH Google Scholar
P. Li, K. W. Church, and T. J. Hastie, “One sketch for all: Theory and applications of conditional random sampling,” in: Proc. NIPS’08 (2008), pp. 953–960.
P. Flajolet and G. N. Martin, “Probabilistic counting algorithms for data base applications,” J. Comput. System Sci., Vol. 31, 182–209 (1985).
Article MathSciNet MATH Google Scholar
E. Cohen, “Size-estimation framework with applications to transitive closure and reachability,” J. Comput. System Sci., Vol. 55, 441–453 (1997).
Article MathSciNet MATH Google Scholar
E. Cohen, “All-distances sketches, revisited: HIP estimators for massive graphs analysis,” in: Proc. PODS’14 (2014), pp. 88-99.
A. Z. Broder, “On the resemblance and containment of documents,” in: Proc. SEQUENCES’97 (1997), pp. 21–29.
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic clustering of the web,” Computer Networks and ISDN Systems, Vol. 29, Nos. 8–13, 1157–1166 (1997).
Article Google Scholar
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher, “Min-wise independent permutations,” J. Comput. System Sci., Vol. 60, 327–336 (1998).
MathSciNet MATH Google Scholar
M. Mitzenmacher, R. Pagh, and N. Pham, “Efficient estimation for high similarities using odd sketches,” in: Proc. WWW’14 (2014), pp. 109–118.
P. Indyk, “A small approximately min-wise independent family of hash functions,” Journal of Algorithms, Vol. 38, No. 1, 84–90 (2001).
Article MathSciNet MATH Google Scholar
M. Patrascu and M. Thorup, “On the k-independence required by linear probing and minwise independence,” ACM Trans. Algorithms, Vol. 12, No. 1, 8:1–8:27 (2016).
S. Dahlgaard and M. Thorup, “Approximately minwise independence with twisted tabulation,” in: Proc. SWAT’14 (2014), pp. 134–145.
M. Thorup, “Fast and Powerful Hashing Using Tabulation, arXiv:1505.01523. (2016).
M. Mitzenmacher and S. Vadhan, “Why simple hash functions work: Exploiting the entropy in a data stream,” in: Proc. SODA’08 (2008), pp. 746–755.
P. Li, A. B. Owen, and C.-H. Zhang, “One permutation hashing,” in: Proc. NIPS’12 (2012), pp. 3122–3130.
M. Charikar, K. Chen, and M. Farach-Colton, “Finding frequent items in data streams,” in: Proc. ICALP’02 (2002), pp. 693–703.
P. Flajolet, É. Fusy, O. Gandouet, and F. Meunier, “Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm,” in: Proc. AofA’07 (2007), pp. 127–146.
A. Shrivastava and P. Li, “Densifying one permutation hashing via rotation for fast near neighbor search,” in: Proc. ICML’14 (2014), pp. 557–565.
A. Shrivastava and P. Li, “Improved densification of one permutation hashing,” in: Proc. UAI’14 (2014), pp. 732–741.
S. Dahlgaard, M. B. T. Knudsen, E. Rotenberg, and M. Thorup, “Hashing for statistics over k-partitions,” in: Proc. FOCS’15 (2015), pp. 1292–1310.
D. Valsesia, S. M. Fosson, C. Ravazzi, T. Bianchi, and E. Magli, “SparseHash: Embedding Jaccard coefficient between supports of signals,” in: ICME 2016 Workshops (2016), pp. 1–16.
E. M. Kussul, D. A. Rachkovskij, and T. N. Baidyk, “Associative-projective neural networks: Architecture, implementation, applications,” in: Proc. Neuro-Nimes’91 (1991), pp. 463–476.
D. A. Rachkovskij, E. M. Kussul, and T. N. Baidyk, “Building a world model with structure-sensitive sparse binary distributed representations,” Biologically Inspired Cognitive Architectures, Vol. 3, pp. 64–86 (2013).
Article Google Scholar
D. Kleyko, E. Osipov, and D. A. Rachkovskij, “Modification of holographic graph neuron using sparse distributed representations,” in: Procedia Computer Science, Vol. 88, 39–45 (2016).
A. Kartashov, A. Frolov, A. Goltsev, and R. Folk, “Quality and efficiency of retrieval for Willshaw-like autoassociative networks: III. Willshaw–Potts model,” Network: Computation in Neural Systems, Vol. 8, No. 1, 71–86 (1997).
Article MATH Google Scholar
A. A. Frolov, D. A. Rachkovskij, and D. Husek, “On information characteristics of Willshaw-like auto-associative memory,” Neural Network World, Vol. 12, No. 2, 141–158 (2002).
Google Scholar
A. A. Frolov, D. Husek, and D. A. Rachkovskij, “Time of searching for similar binary vectors in associative memory,” Cybernetics and Systems Analysis, Vol. 42, No. 5, 615–623 (2006).
Article MATH Google Scholar
K. Eshghi and M. Kafai, “Support Vector Machines with sparse binary high-dimensional feature vectors,” HPE-2016-30 (2016).
N. M. Amosov, T. N. Baidyk, A. D. Goltsev, A. M. Kasatkin, L. M. Kasatkina, E. M. Kussul, and D. A. Rachkovskij, Neurocomputers and Intelligent Robots [in Russian], Naukova Dumka, Kyiv (1991).
Google Scholar
E. M. Kussul, D. A. Rachkovskij, and T. N. Baidyk, “On image texture recognition by an associative-projective neurocomputer,” in: Proc. ANNIE’91 (1991), pp. 453-458.
R. Donaldson, A. Gupta, Y. Plan, and T. Reimer, Random Mappings Designed for Commercial Search Engines, arXiv:1507.05929 (2015).
B. A. Olshausen and D. J. Field, “Sparse coding of sensory inputs,” Curr. Opin. Neurobiol., Vol. 14, 481–487 (2004).
Article Google Scholar
S. Ahmad and J. Hawkins, How Do Neurons Operate on Sparse Distributed Representations? A Mathematical Theory of Sparsity, Neurons and Active Dendrites, arXiv:1601.00720 (2016).
I. S. Misuno, D. A. Rachkovskij, and S. V. Slipchenko, “Vector and distributed representations reflecting semantic relatedness of words,” Mathematical Machines and Systems, No. 3, 50–67 (2005).
I. S. Misuno, D. A. Rachkovskij, S. V. Slipchenko, and A. M. Sokolov, “Searching for text information with the help of vector representations,” Problems of Programming, No. 4, 50–59 (2005).
Q. Shi, J. Petterson, G. Dror, J. Langford, A. J. Smola, and S. V. N. Vishwanathan, “Hash kernels for structured data,” J. Mach. Learn. Res., Vol. 10, 2615–2637 (2009).
MathSciNet MATH Google Scholar
D. A. Rachkovskij, S. V. Slipchenko, E. M. Kussul, and T. N. Baidyk, “Sparse binary distributed encoding of scalars,” Journal of Automation and Information Sciences, Vol. 37, No. 6, 12–23 (2005).
Article Google Scholar
D. A. Rachkovskij, S. V. Slipchenko, I. S. Misuno, E. M. Kussul, and T. N. Baidyk, “Sparse binary distributedencoding of numeric vectors,” Journal of Automation and Information Sciences, Vol. 37, No. 11, 47–61 (2005).
Article Google Scholar
D. A. Rachkovskij, S. V. Slipchenko, E. M. Kussul, and T. N. Baidyk, “A binding procedure for distributed binary data representations,” Cybernetics and Systems Analysis, Vol. 41, No. 3, 319–331 (2005).
Article MathSciNet MATH Google Scholar
E. M. Kussul, D. A. Rachkovskij, and D. C. Wunsch, “The random subspace coarse coding scheme for real-valued vectors,” in: Proc. IJCNN’99 (1999), pp. 450–455.
D. A. Rachkovskij, S. V. Slipchenko, E. M. Kussul, and T. N. Baidyk, “Properties of numeric codes for the scheme of random subspaces RSC,” Cybernetics and Systems Analysis, Vol. 41, No. 4, 509–520 (2005).
Article MathSciNet MATH Google Scholar
K. Eshghi and M. Kafai, “The CRO Kernel: Using concomitant rank order hashes for sparse high dimensional randomized feature maps,” in: Proc. ICDE’16 (2016), pp. 721–730.
K. Forbus, R. Ferguson, A. Lovett, and D. Gentner, “Extending SME to handle large-scale cognitive modeling,” DOI: 10.1111/cogs.12377 (2016).
D. A. Rachkovskij and S. V. Slipchenko, “Similarity-based retrieval with structure-sensitive sparse binary distributed representations,” Computational Intelligence, Vol. 28, No. 1, 106–129 (2012).
Article MathSciNet Google Scholar
D. A. Rachkovskij, “Some approaches to analogical mapping with structure sensitive distributed representations,” J. Experimental and Theoretical Artificial Intelligence, Vol. 16, No. 3, 125–145 (2004).
Article MATH Google Scholar
S. V. Slipchenko and D. A. Rachkovskij, “Analogical mapping using similarity of binary distributed representations,” Int. J. Information Theories and Applications, Vol. 16, No. 3, 269–290 (2009).
Google Scholar
L. Jacques, Small Width, Low Distortions: Quasi-Isometric Embeddings with Quantized Sub-Gaussian Random Projections, arXiv:1504.06170 (2015).
L. Jacques and V. Cambareri, Time for Dithering: Fast and Quantized Random Embeddings via the Restricted Isometry Property, arXiv:1607.00816 (2016).
P. T. Boufounos, H. Mansour, S. Rane, and A. Vetro, “Dimensionality reduction of visual features for efficient retrieval and classification,” APSIPA Trans. on Signal and Information Processing, Vol. 5, No. e14, 1–14 (2016).
Google Scholar
P. T. Boufounos, S. Rane, and H. Mansour, Representation and Coding of Signal Geometry, arXiv:1512.07636 (2015).
Q. Lv, M. Charikar, and K. Li, “Image similarity search with compact data structures,” in: Proc. CIKM’04 (2004), pp. 208–217.
Z. Wang, W. Dong, W. Josephson, Q. Lv, M. Charikar, and K. Li, “Sizing sketches: Rank-based analysis for similarity search,” in: Proc. SIGMETRICS’07 (2007), pp. 157–168.
W. Dong, M. Charikar, and K. Li, “Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces,” in: Proc. SIGIR’08 (2008), pp. 123–130.
K. Min, L. Yang, J. Wright, L. Wu, X.-S. Hua, and Y. Ma, “Compact projection: Simple and efficient near neighbor search with practical memory requirements,” in: Proc. CVPR’10 (2010), pp. 3477–3484.
E. Chávez, G. Navarro, R. Baeza-Yates, and J. L Marroquín, “Searching in metric spaces,” ACM Computing Surveys, Vol. 33, No. 3, 273–321 (2001).
Article Google Scholar
P. Zezula, G. Amato, V. Dohnal, and M. Batko, Similarity Search: The Metric Space Approach, Springer, New York (2006).
MATH Google Scholar
G. R. Hjaltason and H. Samet, “Index-driven similarity search in metric spaces,” ACM Transactions on Database Systems, Vol. 28, No. 4, 517–580 (2003).
Article Google Scholar
A. Becker, L. Ducas, N. Gama, and T. Laarhoven, “New directions in nearest neighbor searching with applications to lattice sieving,” in: Proc. SODA’16 (2016), pp. 10–24.
M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,” IEEE Trans. on PAMI, Vol. 36, No. 11, 2227–2240 (2014).
Article Google Scholar
X. Zhang, J. Qin, W. Wang, Y. Sun, and J. Lu, “Hmsearch: An efficient hamming distance query processing algorithm,” in: Proc. SSDBM’13 (2013), pp. 19:1–19:12.
M. Norouzi, A. Punjani, and D. J. Fleet, “Fast exact search in Hamming space with multi-index hashing,” IEEE Trans. PAMI, Vol. 36, No. 6, 1107–1119 (2014).
Article Google Scholar
J. Song, H. T. Shen, J. Wang, Z. Huang, N. Sebe, and J. Wang, “A distance-computation-free search scheme for binary code databases,” IEEE Trans. Multimedia, Vol. 18, No. 3, 484–495 (2016).
Article Google Scholar
N. Pham and R. Pagh, “Scalability and total recall with fast CoveringLSH,” in: Proc. CIKM’16 (2016).
Z. Jiang, L. Xie, X. Deng, W. Xu, and J. Wang, “Fast nearest neighbor search in the hamming space,” in: Proc. MMM’16 (2016), pp. 325–336.
J. Wang, W. Liu, S. Kumar, and S.-F. Chang, “Learning to hash for indexing big data: A survey,” in: Proc. of the IEEE, Vol. 104, No. 1, 34–57 (2016).
J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen, A Survey on Learning to Hash, arXiv:1606.00185 (2016).

Download references

Author information

Authors and Affiliations

International Scientific-Educational Center of Information Technologies and Systems, NAS and MES of Ukraine, Kyiv, Ukraine
D. A. Rachkovskij

Authors

D. A. Rachkovskij
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. A. Rachkovskij.

Additional information

Translated from Kibernetika i Sistemnyi Analiz, No. 1, January–February, 2017, pp. 160–183.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rachkovskij, D.A. Binary Vectors for Fast Distance and Similarity Estimation. Cybern Syst Anal 53, 138–156 (2017). https://doi.org/10.1007/s10559-017-9914-x

Download citation

Received: 17 May 2016
Published: 26 January 2017
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10559-017-9914-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Binary Vectors for Fast Distance and Similarity Estimation

Abstract

Access this article

Similar content being viewed by others

Estimation of Vectors Similarity by Their Randomized Binary Projections

Real-Valued Embeddings and Sketches for Fast Distance and Similarity Estimation

Distance-Based Index Structures for Fast Similarity Search

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Binary Vectors for Fast Distance and Similarity Estimation

Abstract

Access this article

Similar content being viewed by others

Estimation of Vectors Similarity by Their Randomized Binary Projections

Real-Valued Embeddings and Sketches for Fast Distance and Similarity Estimation

Distance-Based Index Structures for Fast Similarity Search

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation