Skip to main content
Log in

Distance-Based Index Structures for Fast Similarity Search

  • Published:
Cybernetics and Systems Analysis Aims and scope

Abstract

This review considers the class of index structures for fast similarity search. In constructing and applying such structures, only information on values or ranks of some distances/similarities between objects is used. The search by metric distances (satisfying the triangle inequality and other metric axioms) and by nonmetric distances is discussed. Structures that return objects of a base that represent the exact answer to a search query and also structures for approximate similarity search are presented (the latter structures do not guarantee precision, but usually return results close to exact and operate faster than structures for exact search). General principles of construction and application of some index structures are stated, and also ideas underlying concrete algorithms (both well-known and proposed lately) are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Datta, D. Joshi, J. Li, and J. Wang, “Image retrieval: Ideas, influences, and trends of the new age,” ACM Computing Surveys, Vol. 40, No. 2, 1–60 (2008).

    Article  Google Scholar 

  2. C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, New York (2008).

    Book  MATH  Google Scholar 

  3. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd Edition, Wiley, New York (2001).

    MATH  Google Scholar 

  4. R. Lopez De Mantaras, D. Mcsherry, D. Bridge, D. Leake, B. Smyth, S. Craw, B. Faltings, M. L. Maher, M. T. Cox, K. Forbus, M. Keane, A. Aamodt, and I. Watson, “Retrieval, reuse, revision and retention in case-based reasoning,” Knowledge Engineering Review. Vol. 20, No. 3, 215–240 (2005).

    Article  Google Scholar 

  5. M. G. Voskoglou and A.-B. M. Salem, “Analogy-based and case-based reasoning: Two sides of the same coin,” IJAFSAI, Vol. 4, 5–51 (2014).

  6. C. M. Wharton, K. J. Holyoak, P. E. Downing, T. E. Lange, T. D. Wickens, and E. R. Melz, “Below the surface: Analogical similarity and retrieval competition reminding,” Cognitive Psychology, Vol. 26, 64–101 (1994).

  7. D. Gentner and L. Smith, “Analogical reasoning,” in: V. S. Ramachandran (ed.), Encyclopedia of Human Behavior, Vol. 1, 2nd ed., Elsevier, Oxford, UK (2012), pp 130–136.

  8. D. A. Rachkovskij and S. V. Slipchenko, “Similarity-based retrieval with structure-sensitive sparse binary distributed representations,” Computational Intelligence, Vol. 28, No. 1, 106–129 (2012).

    Article  MathSciNet  Google Scholar 

  9. K. Forbus, R. Ferguson, A. Lovett, and D. Gentner, “Extending SME to handle large-scale cognitive modeling,” DOI: 10.1111/cogs.12377 (2016).

    Google Scholar 

  10. D. A. Rachkovskij, “Real-valued embeddings and sketches for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 52, No. 6, 967-988 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  11. D. A. Rachkovskij, “Binary vectors for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 53, No. 1, 138–156 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  12. E. Chavez, G. Navarro, R. Baeza-Yates, and J. L. Marroquin, “Searching in metric spaces,” ACM Computing Surveys, Vol. 33, No. 3, 273–321 (2001).

    Article  Google Scholar 

  13. G. R. Hjaltason and H. Samet, “Index-driven similarity search in metric spaces,” ACM Transactions on Database Systems, Vol. 28, No. 4, 517–580 (2003).

    Article  Google Scholar 

  14. H. Samet, Foundations of Multidimensional and Metric Data Structures, Morgan Kaufmann, San Francisco (2006).

    MATH  Google Scholar 

  15. P. Zezula, G. Amato, V. Dohnal, and M. Batko, Similarity Search: The Metric Space Approach, Springer, New York (2006).

    MATH  Google Scholar 

  16. A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,” Communications of the ACM, Vol. 51, No. 1, 117–122 (2008).

    Article  Google Scholar 

  17. A. Andoni and P. Indyk, “Nearest neighbors in high-dimensional spaces,” in: Handbook of Discrete and Computational Geometry, Ch. 43, 3rd ed. (to appear) (2017).

  18. K. Fukunaga and P. M. Narendra, “A branch and bound algorithm for computing k-nearest neighbors,” IEEE Trans. Comput., Vol. C-24, No. 7, 750–753 (1975).

    Article  MATH  Google Scholar 

  19. J. Lokoc and T. Skopal, “On applications of parameterized hyperplane partitioning,” in: Proc. SISAP 10 (2010), pp. 131–132.

  20. L. Cayton, “Efficient Bregman range search” in: Proc. NIPS 09 (2009), pp. 243–251.

  21. R. Connor, L. Vadicamo, F. A. Cardillo, and F. Rabitti, “Supermetric search with the four-point property,” in: Proc. SISAP 16 (2016), pp. 51–64.

  22. G. R. Hjaltason and H. Samet, “Properties of embedding methods for similarity searching in metric spaces,” IEEE Trans. PAMI, Vol. 25, No. 5, 530–549 (2003).

  23. K. Clarkson, “Nearest-neighbor searching and metric space dimensions,” in: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, MIT Press (2006), pp. 15–59.

  24. R. Weber, H. J. Schek, and S. Blott, “A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces,” in: Proc. VLDB 98 (1998), pp. 194–205.

  25. C. Bohm, S. Berchtold, and D. A. Keim, “Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases,” ACM Com. Surv., Vol. 33, No. 3, 322–373 (2001).

    Article  Google Scholar 

  26. K. Beyer, J. Goldstein, R. Ramakhrishnan, and U. Shaft, “When is ”nearest neighbor" meaningful?" in: Proc. ICDT 99 (1999), pp. 217–235.

    Google Scholar 

  27. U. Shaft and R. Ramakrishnan, “Theory of nearest neighbors indexability,” ACM Trans. Database Syst., Vol. 31, 814–838 (2006).

    Article  Google Scholar 

  28. I. Volnyansky and V. Pestov, “Curse of dimensionality in pivot based indices,” in: Proc. SISAP 09 (2009), pp. 39–46.

  29. V. Pestov, “Indexability, concentration, and VC theory,” Journal of Discrete Algorithms, Vol. 13, 2–18 (2012).

  30. F. Camastra, “Data dimensionality estimation methods: A survey,” Pattern Recogn., Vol. 6, No 12, 2945–2954 (2003).

    Article  MATH  Google Scholar 

  31. C. Traina, R. F. Santos Filho, A. J. M. Traina, M. R. Vieira, and C. Faloutsos, “The Omni-family of all-purpose access methods: A simple and effective way to make similarity search more efficient,” VLDB Journal, Vol. 16, No. 4, 483–505 (2007).

    Article  Google Scholar 

  32. T. Skopal and B. Bustos, “On nonmetric similarity search problems in complex domains,” ACM Comput. Surveys, Vol. 43, No 4, 34:1–34:50 (2011).

  33. R. Mao, W. L. Mirankerb, and D. P. Mirankerc, “Pivot selection: Dimension reduction for distance-based indexing,” J. Discrete Algorithms, Vol. 13, 32–46( 2012).

  34. M. Patella and P. Ciaccia, “Approximate similarity search: A multi-faceted problem,” J. Discrete Algorithms, Vol. 7, No. 1, 36–48 (2009).

    Article  MathSciNet  MATH  Google Scholar 

  35. D. M. W. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” J. of Machine Learning Tech., Vol. 2, No. 1, 37–63 (2011).

    MathSciNet  Google Scholar 

  36. M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,” IEEE TPAMI, Vol. 36, No. 11, 2227–2240 (2014).

    Article  Google Scholar 

  37. G. Navarro, “Analyzing metric space indices: What for?” in: Proc. SISAP 09 (2009), pp. 3–10.

  38. E. Vidal, “An algorithm for finding nearest neighbors in (approximately) constant average time,” Patt. Recog. Lett., Vol. 4, No. 3, 145–157 (1986).

    Article  Google Scholar 

  39. E. Vidal, “New formulation and improvements of the nearest-neighbor approximating and eliminating search algorithm (AESA),” Patt. Recog. Lett., Vol. 15, No. 1, 1–7 (1994).

    Article  Google Scholar 

  40. K. Figueroa, E. Chavez, G. Navarro, and R. Paredes, “Speeding up spatial approximation search in metric spaces,” ACM Journal of Experimental Algorithmics, Vol. 14, 3.6.1–3.6.21 (2009).

  41. L. Mico, J. Oncina, and E. Vidal, “A new version of the nearest-neighbor approximating and eliminating search (AESA) with linear preprocessing-time and memory requirements,” Patt. Recog. Lett., Vol. 15, No 1, 9–17 (1994).

    Article  Google Scholar 

  42. S. Nene and S. Nayar, “A simple algorithm for nearest neighbor search in high dimensions,” IEEE Trans. PAMI, Vol. 19, No. 9, 989–1003 (1997).

    Article  Google Scholar 

  43. E. Chavez, J. Marroquín, and R. Baeza-Yates, “Spaghettis: An array based algorithm for similarity queries in metric spaces,” in: Proc. SPIRE 99 (1999), pp. 38–46.

  44. I. Munro, R. Raman, V. Raman, and S. S. Rao, “Succinct representations of permutations and functions,” Theor. Comput. Sci., Vol. 438, 74–88 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  45. E. Chavez, U. Ruiz, and E. Tellez, “CDA: Succinct spaghetti,” in: Proc. SISAP 15 (2015), 54–64.

  46. K. Tokoro, K. Yamaguchi, and S. Masuda, “Improvements of TLAESA nearest neighbor search algorithm and extension to approximation search,” in: Proc. ACSC 06 (2006), pp. 77–83.

  47. G. Ruiz, F. Santoyo, E. Chavez, K. Figueroa, and E. Tellez, “Extreme pivots for faster metric indices,” in: Proc. SISAP 13 (2013), pp. 115–126.

  48. J. K. Uhlmann, “Satisfying general proximity/similarity queries with metric trees,” Information Processing Letters, Vol. 40, No. 4, 175–179 (1991).

    Article  MATH  Google Scholar 

  49. P. N. Yianilos, “Data structures and algorithms for nearest neighbor search in general metric spaces,” in: Proc. SODA 93 (1993), pp. 311–321.

  50. T. Chiueh, “Content-based image indexing,” in: Proc. VLDB 94 (1994), pp. 582–593.

  51. T. Bozkaya and M. Ozsoyoglu, “Indexing large metric spaces for similarity search queries,” ACM Trans. Datab. Syst., Vol. 24, No. 3, 361–404 (1999).

    Article  Google Scholar 

  52. A. W.-C. Fu, P. M.-S. Chan, Y.-L. Cheung, and Y. S. Moon, “Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances,” VLDB Journal, Vol. 9, No. 2, 154–173 (2000).

    Article  Google Scholar 

  53. P. Yianilos, “Excluded middle vantage point forests for nearest neighbor search,” in: DIMACS Implementation Challenge, ALENEX 1999. URL: http://citeseer.ist.psu.edu/.

  54. I. Kalantari and G. Mcdonald, “A data structure and an algorithm for the nearest point problem,” IEEE Trans. Softw. Eng., Vol. 9, No. 5, 631–634 (1983).

    Article  MATH  Google Scholar 

  55. F. Dehne and H. Noltemeier, “Voronoi trees and clustering problems,” Information Systems, Vol. 12, No. 2, 171–175 (1987).

    Article  Google Scholar 

  56. H. Noltemeier, K. Verbarg, and C. Zirkelbach, “Monotonous bisector* trees — A tool for efficient partitioning of complex scenes of geometric objects,” LNCS, Vol. 594, 186–203 (1992).

  57. P. Ciaccia, M. Patella, and P. Zezula, “Mtree: An efficient access method for similarity search in metric spaces,” in: Proc. VLDB 97 (1997), pp. 426–435.

  58. P. Zezula, P. Savino, G. Amato, and F. Rabitti, “Approximate similarity retrieval with M-trees,” VLDB Journal, Vol. 7, No. 4, 275–293 (1998).

    Article  Google Scholar 

  59. T. Skopal, J. Pokorny, and V. Snasel, “PM-tree: Pivoting metric tree for similarity search in multimedia databases,” in: Proc. ADBIS 04 (2004), pp. 99–114.

  60. S. Jin, O. Kim, and W. Feng, “MX-tree: A double hierarchical metric index with overlap reduction,” in: Proc. ICCSA 13 (2013), pp. 574–589.

  61. S. Brin, “Near neighbor search in large metric spaces,” in: Proc. VLDB 95 (1995), pp. 574–584.

  62. K. Fredriksson, Geometric Near-Neighbor Access Tree (GNAT) Revisited. arXiv:1605.05944. 20 May 2016.

  63. G. Navarro and R. Uribe, “Fully dynamic metric access methods based on hyperplane partitioning,” Information Systems, Vol. 36, No. 4, 734–747 (2011).

    Article  Google Scholar 

  64. R. Connor, “Reference point hyperplane trees,” in: Proc. SYSAP 16 (2016), pp. 65–78.

  65. S. O Hara and B. A. Draper, “Are you using the right approximate nearest neighbor algorithm?” in: Proc. WACV 13 (2013), pp. 9–14.

  66. D. Comer, “The ubiquitous B-tree,” ACM Comput. Surv., Vol. 11, 121–138 (1979).

    Article  MATH  Google Scholar 

  67. D. Novak and M. Batko, “Metric Index: An efficient and scalable solution for precise and approximate similarity search,” Information Systems, Vol. 36, No. 4, 721–733 (2011).

    Article  Google Scholar 

  68. J. Lokoc, J. Mosko, P. Cech, and T. Skopal, “On indexing metric spaces using cut-regions,” Information Systems, Vol. 43, 1–19 (2014).

  69. L. Chen, Y. Gao, X. Li, C. S. Jensen, and G. Chen, “Efficient metric indexing for similarity search,” in: Proc. ICDE 15 (2015), pp. 591–602.

  70. G. Navarro, “Searching in metric spaces by spatial approximation,” VLDB Journal, Vol. 11, No. 1, 28–46 (2002).

    Article  Google Scholar 

  71. G. Navarro and N. Reyes, “Dynamic spatial approximation trees,” Journal of Experimental Algorithmics, Vol. 12, Article 1.5 (2009).

  72. M. Barroso, N. Reyes, and R. Paredes, “Enlarging nodes to improve spatial approximation trees,” in: Proc. SISAP 10 (2010), pp. 41–48.

  73. G. Navarro and N. Reyes, “New dynamic metric indices for secondary memory,” Information Systems, Vol. 59, 48–78 (2016).

  74. E. Chavez, V. Luduena, N. Reyes, and P. Roggero, “Faster proximity searching with the distal SAT,” Information Systems, Vol. 59, 15–47 (2016).

  75. A. Beygelzimer, S. Kakade, and J. C. Langford, “Cover trees for nearest neighbor,” in Proc. ICML 06 (2006), pp. 97–104.

  76. R. R. Curtin, Improving Dual-Tree Algorithms, Ph.D. Thesis, Georgia Inst. Tech. (2015).

  77. E. Chavez and G. Navarro, “A compact space decomposition for effective metric indexing,” Pattern Recognition Letters, Vol. 26, No. 9, 1363–1376 (2005).

    Article  Google Scholar 

  78. P. Roggero, N. Reyes, K. Figueroa, and R. Paredes, “List of clustered permutations in secondary memory for proximity searching,” J. of Com. Science Tech., Vol. 15, No. 2, 107–113 (2015).

    Google Scholar 

  79. A. Ponomarenko, N. Avrelin, B. Naidan, and L. Boytsov, “Comparative analysis of data structures for approximate nearest neighbor search,” DATA ANALYTICS 2014 (2014), pp. 125–130.

  80. V. Dohnal, C. Gennaro, P. Savino, and P. Zezula, “D-index: Distance searching index for metric data sets,” Multimedia Tools and Applications, Vol. 21, No. 1, 9–33 (2003).

    Article  Google Scholar 

  81. L. Cayton, “Accelerating nearest neighbor search on manycore systems,” in: Proc. IPDPS 12 (2012), pp. 402–413.

  82. E. S. Tellez, G. Ruiz, and E. Chavez, “Singleton indices for nearest neighbor search,” Information Systems, Vol. 60, 50–68 (2016).

  83. D. J. Rosenkrantz, R. E. Stearns, and P. M. Lewis, “II. An analysis of several heuristics for the traveling salesman problem,” SIAM Journal on Computing, Vol. 6, No. 3, 563–581 (1977).

    Article  MathSciNet  MATH  Google Scholar 

  84. T. F. Gonzalez, “Clustering to minimize the maximum intercluster distance,” Theoretical Computer Science, Vol. 38, 293–306 (1985).

  85. B. Bustos, G. Navarro, and E. Chavez, “Pivot selection techniques for proximity searching in metric spaces,” Pattern Recogn. Lett., Vol. 24, 2357–2366 (2003).

    Article  MATH  Google Scholar 

  86. N. R. Brisaboa, A. Farina, O. Pedreira, and N. Reyes, “Similarity search using sparse pivots for efficient multimedia information retrieval,” in: Proc. ISM 06 (2006), pp. 881–888.

  87. R. H. Van Leuken and R. C. Veltkamp, “Selecting vantage objects for similarity indexing,” ACM Trans. Multimedia Comput. Commun. Appl., Vol. 7, 16:1–16:18 (2011).

  88. S.-H. Kim, D.-Y. Lee, and H.-G. Cho, “An eigenvalue-based pivot selection strategy for improving search efficiency in metric spaces,” in: Proc. BigComp 16 (2016), pp. 207–214.

  89. A. Berman and L. G. Shapiro, “Selecting good keys for triangle-inequality-based pruning algorithms,” in: Proc. CAIVD 98 (1998), pp. 12–19.

  90. J. Venkateswaran, T. Kahveci, C. M. Jermaine, and D. Lachwani, “Reference-based indexing for metric spaces with costly distance measures,” VLDB Journal, Vol. 17, No. 5, 1231–1251 (2008).

    Article  Google Scholar 

  91. R. Mao, P. Zhang, X. Li, L. Xi, and M. Lu, “Pivot selection for metric-space indexing,” Int. J. Mach. Learn. Cybern., Vol. 7, No. 2, 311–323 (2016).

    Article  Google Scholar 

  92. C. Celik, “Effective use of space for pivot-based metric indexing structures,” in: Proc. SISAP 08 (2008), pp. 113–120.

  93. M. L. Hetland, T. Skopal, J. Lokoc, and C. Beecks, “Ptolemaic access methods: Challenging the reign of the metric space model,” Information Systems, Vol. 38, No. 7, 989–1006 (2013).

    Article  Google Scholar 

  94. M. L. Hetland, “Ptolemaic indexing,” JoCG, Vol. 6, No. 1, 165–184 (2015).

    MathSciNet  MATH  Google Scholar 

  95. R. Connor, L. Vadicamo, F. A. Cardillo, and F. Rabitti, “Supermetric search with the four-point property,” in: Proc. SISAP 16 (2016), pp. 51–64.

  96. P. Ciaccia and M. Patella, “Searching in metric spaces with user-defined and approximate distances,” ACM Database Systems, Vol. 27, No. 4, 398–437 (2002).

    Article  Google Scholar 

  97. L. Chen and X. Lian, “Efficient similarity search in nonmetric spaces with local constant embedding,” IEEE TKDE, Vol. 20, No. 3, 321–336 (2008).

    Google Scholar 

  98. T. Skopal and J. Lokoc, “NM-tree: Flexible approximate similarity search in metric and non-metric spaces,” in: Proc. DEXA 08 (2008), pp. 312–325.

  99. R. R. Curtin, P. Ram, and A. G. Gray, “Fast exact max-kernel search,” in: Proc. SDM 13 (2013), pp. 1–9.

  100. E. Keogh and C. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and Information Systems, Vol. 7, No. 3, 358–386 (2005).

    Article  Google Scholar 

  101. Z. Zhang, B. C. Ooi, S. Parthasarathy, and A. K. H. Tung, “Similarity search on Bregman divergence: Towards non-metric indexing,” in: Proc. VLDB Endowment, Vol. 2(2009), pp. 13–24.

  102. A. Abdullah, J. Moeller, and S. Venkatasubramanian, “Approximate Bregman near neighbors in sublinear time: Beyond the triangle inequality,” in: Proc. SCG 12 (2012), pp. 31–40.

  103. G. Amato and P. Savino, “Approximate similarity search in metric spaces using inverted files,” in: Proc. InfoScale 08 (2008), pp. 28:1–28:10.

  104. E. Chavez, K. Figueroa, and G. Navarro,“Effective proximity retrieval by ordering permutations,” IEEE TPAMI, Vol. 30, No. 9, 1647–1658 (2008).

    Article  Google Scholar 

  105. E. S. Tellez, E. Chavez, and A. Camarena-Ibarrola, “A brief index for proximity searching,” in: Proc. CIARP 09 (2009), pp. 529–536.

  106. G. Amato, C. Gennaro, and P. Savino, “Mi-file: Using inverted files for scalable approximate similarity search,” Multimed. Tools Appl., Vol. 71, No. 3, 1333–1362 (2014).

    Article  Google Scholar 

  107. A. Esuli, “Use of permutation prefixes for efficient and scalable approximate similarity search,” Information Processing & Management, Vol. 48, No. 5 889–902 (2012).

    Article  Google Scholar 

  108. E. S. Tellez, E. Chavez, and G. Navarro, “Succinct nearest neighbor search,” Information Systems, Vol. 38, No. 7, 1019–1030 (2013).

    Article  Google Scholar 

  109. E. Chavez, M. Graff, G. Navarro, and E. Tellez, “Near neighbor searching with K nearest references,” Information Systems, Vol. 51, 43–61 (2015).

  110. B. Naidan, L. Boytsov, and E. Nyberg, “Permutation search methods are efficient, yet faster search is possible,” in: Proc. VLDB Endowment, Vol. 8, No. 12, 1618–1629 (2015).

  111. N. Goyal, Y. Lifshits, and H. Schutze, “Disorder inequality: A combinatorial approach to nearest neighbor search,” in: Proc. WSDM 08 (2008), pp. 25–32.

  112. Y. Lifshits and S. Zhang, “Combinatorial algorithms for nearest neighbors, near-duplicates and small world design,” in: Proc. SODA 09 (2009), pp. 318–326.

  113. D. Tschopp, S. N. Diggavi, P. Delgosha, and S. Mohajer, “Randomized algorithms for comparison-based search,” in: Proc. NIPS 11 (2011), pp. 2231–2239.

  114. M. E. Houle and J. Sakuma, “Fast approximate similarity search in extremely high-dimensional data sets,” in: Proc. ICDE 05 (2005), pp. 619–630.

  115. M. E. Houle and M. Nett, “Rank-based similarity search: Reducing the dimensional dependence,” IEEE TPAMI, Vol. 37, No. 1, 136–150 (2015).

    Article  Google Scholar 

  116. S. Arya and D. M. Mount, “Approximate nearest neighbor queries in fixed dimensions,” in: Proc. SODA 93 (1993), pp. 271–280.

  117. T. Sebastian and B. Kimia, “Metric-based shape retrieval in large databases,” in: Proc. ICPR 02, Vol. 3 (2002), pp. 291–296.

  118. R. Paredes and E. Chavez, “Using the k-nearest neighbor graph for proximity searching in metric spaces,” in: Proc. SPIRE 05 (2005), pp. 127–138.

  119. K. Hajebi, Y. Abbasi-Yadkori, H. Shahbazi, and H. Zhang, “Fast approximate nearest-neighbor search with K-nearest neighbor graph,” in: Proc. IJCAI 11 (2011), pp. 1312–1317.

  120. Y. Malkov, A. Ponomarenko, A. Logvinov, and V. Krylov, “Scalable distributed algorithm for approximate nearest neighbor search problem in high dimensional general metric spaces,” in: Proc. SISAP 12 (2012), pp. 132–147.

  121. Y. Malkov, A. Ponomarenko, A. Logvinov, and V. Krylov, “Approximate nearest neighbor algorithm based on navigable small world graphs,” Information Systems, Vol. 45, 61–68 (2014).

  122. B. Harwood and T. Drummond, “FANNG: Fast approximate nearest neighbor graphs,” in: Proc. CVPR 16 (2016), pp. 5713–5722.

  123. R. Paredes, E. Chavez, K. Figueroa, and G. Navarro, “Practical construction of k-nearest neighbor graphs in metric spaces,” in: Proc. WEA 06 (2006), pp. 85–97.

  124. W. Dong, M. Charikar, and K. Li, “Efficient K-nearest neighbor graph construction for generic similarity measures,” in: Proc. WWW 11 (2011), pp. 577–586.

  125. K. Aoyama, K. Saito, H. Sawada, and N. Ueda, “Fast approximate similarity search based on degree-reduced neighborhood graphs,” in: Proc. KDD 11 (2011), pp. 1055–1063.

  126. W. Li, Y. Zhang, Y. Sun, W. Wang, W. Zhang, and X. Lin, Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement. arXiv:1610.02455. 8 Oct 2016.

  127. D. J. Watts and S. H. Strogatz, “Collective dynamics of small-world networks,” Nature, Vol. 393, No. 6684, 440–442 (1998).

    Article  Google Scholar 

  128. J. Kleinberg, “The small-world phenomenon: An algorithmic perspective,” in: Proc. STOC 00 (2000), pp. 163–170.

  129. F. R. K. Chung, “Diameters of graphs: Old problems and new results,” Congr. Numer., Vol. 60, 295–317 (1987).

    MathSciNet  Google Scholar 

  130. D. Achlioptas and P. Siminelakis, “Navigability is a robust property,” in: Proc. WAW 15 (2015), pp. 78–91.

  131. P. Fraigniaud and G. Giakkoupis, “On the searchability of small-world networks with arbitrary underlying structure,” in: Proc. STOC 10 (2010), pp. 389–398.

  132. P. Fraigniaud, E. Lebhar, and Z. Lotker, “A lower bound for network navigability,” SIAM Journal on Discrete Mathematics, Vol. 24, No. 1, 72–81 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  133. P. Fraigniaud, C. Gavoille, A. Kosowski, E. Lebhar, and Z. Lotker, “Universal augmentation schemes for network navigability: Overcoming the \( \sqrt{n} \)-barrier,” in: Proc. SPAA 07 (2007), pp. 1–7.

  134. G. Ruiz, E. Chavez, M. Graff, and E. S. Tellez, “Finding near neighbors through local search,” in: Proc. SISAP 15 (2015), pp. 103–109.

  135. A. Ponomarenko, N. Avrelin, B. Naidan, and L. Boytsov, “Comparative analysis of data structures for approximate nearest neighbor search,” in: Proc. Data Analytics 14 (2014), pp. 125–130.

  136. Yu. A. Malkov and D. A. Yashunin, Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. arXiv:1603.09320. 21 May, 2016.

  137. A. Sokolov, “Vector representations for efficient comparison and search for similar strings,” Cybernetics and Systems Analysis, Vol. 43, No. 4, 484–498 (2007).

    Article  MathSciNet  MATH  Google Scholar 

  138. A. Sokolov, “Investigation of accelerated search for close text sequences with the help of vector representations,” Cybernetics and Systems Analysis, Vol. 44, No. 4, 493–506 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  139. M. Charikar, “Similarity estimation techniques from rounding algorithms,” in: Proc. STOC 02 (2002), pp. 380–388.

  140. E. S. Tellez and E. Chavez, “On locality sensitive hashing in metric spaces,” in: Proc. SISAP 10 (2010), pp. 67–74.

  141. V. Athitsos, M. Potamias, P. Papapetrou, and G. Kollios, “Nearest neighbor retrieval using distance-based hashing,” in: Proc. ICDE 08 (2008), pp. 327–336.

  142. P. Jangyodsuk, P. Papapetrou, and V. Athitsos, “Optimizing hashing functions for similarity indexing in arbitrary metric and nonmetric spaces,” in: Proc. SDM 15 (2015), pp. 828–836.

  143. J. M. Andrade, C. A. Astudillo, and R. Paredes, “Metric space searching based on random bisectors and binary fingerprints,” in: Proc. SISAP 14 (2014), pp. 50–57.

  144. B. Kang and K. Jung, “Robust and efficient locality sensitive hashing for nearest neighbor search in large data sets,” in: Proc. BigLearn 12 (2012), pp. 1–8.

  145. E. S. Silva, T. S. F. X. Teixeira, G. Teodoro, and E. Valle, “Large-scale distributed locality-sensitive hashing for general metric data,” in: Proc. SISAP 14 (2014), pp. 82–93.

  146. D. Novak, M. Kyselak, and P. Zezula, “On locality-sensitive indexing in generic metric spaces,” in: Proc. SISAP 10 (2010), pp. 59–66.

  147. A. Becker, L. Ducas, N. Gama, and T. Laarhoven, “New directions in nearest neighbor searching with applications to lattice sieving,” in: Proc. SODA 16 (2016), pp. 10–24.

  148. ANN benchmark, http://github.com/erikbern/ann-benchmarks. Accessed 12 Apr. 2017.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. A. Rachkovskij.

Additional information

Translated from Kibernetika i Sistemnyi Analiz, No. 4, July–August, 2017, pp. 165–192.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rachkovskij, D.A. Distance-Based Index Structures for Fast Similarity Search. Cybern Syst Anal 53, 636–658 (2017). https://doi.org/10.1007/s10559-017-9966-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10559-017-9966-y

Keywords

Navigation