Skip to main content
Log in

Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω, equipped with a distance, ρ, and an underlying probability distribution, μ. While performing an asymptotic analysis, we send the intrinsic dimension d of Ω to infinity, and assume that the size of a dataset, n, grows superpolynomially yet subexponentially in d. Exact similarity search refers to finding the nearest neighbour in the dataset X to a query point ωΩ, where the query points are subject to the same probability distribution μ as datapoints. Let denote a class of all 1-Lipschitz functions on Ω that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets {ω:f(ω)≥a}, a∈ℝ is o(n 1/4/log2 n). (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption d O(1) is reasonable.) We deduce the Ω(n 1/4) lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in (Ω,X). In paricular, this bound is superpolynomial in d.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Andoni, A., Indyk, P., Pǎtrascu, M.: On the optimality of the dimensionality reduction method. In: Proc. 47th IEEE Symp. on Foundations of Computer Science, pp. 449–458 (2006)

    Google Scholar 

  2. Barkol, O., Rabani, Y.: Tighter lower bounds for nearest neighbor search and related problems in the cell probe model. In: Proc. 32nd ACM Symp. on the Theory of Computing, pp. 388–396 (2000)

    Google Scholar 

  3. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful. In: Proc. 7-th Intern. Conf. on Database Theory, ICDT-99, Jerusalem, pp. 217–235 (1999)

    Chapter  Google Scholar 

  4. Borodin, A., Ostrovsky, R., Rabani, Y.: Lower bounds for high-dimensional nearest neighbor search and related problems. In: Proc. 31st Annual ACS Sympos. Theory Comput, pp. 312–321 (1999)

    Google Scholar 

  5. Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24, 2357–2366 (2003)

    Article  MATH  Google Scholar 

  6. Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognit. Lett. 26, 1363–1376 (2005)

    Article  Google Scholar 

  7. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33, 273–321 (2001)

    Article  Google Scholar 

  8. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proc. 23rd Int. Conf. on Very Large Data Bases, VLDB’97, Athens, Greece, pp. 426–435 (1997)

    Google Scholar 

  9. Ciaccia, P., Patella, M., Zezula, P.: A cost model for similarity queries in metric spaces. In: Proc. 17-th ACM Symposium on Principles of Database Systems, PODS’98, Seattle, WA, pp. 59–68 (1998)

    Google Scholar 

  10. Clarkson, K.L.: An algorithm for approximate closest-point queries. In: Proc. 10th Symp. Comp. Geom, pp. 160–164. Stony Brook, New York (1994)

    Google Scholar 

  11. Clarkson, K.L.: Nearest-neighbor searching and metric space dimensions. In: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pp. 15–59. MIT Press, New York (2006)

    Google Scholar 

  12. Faragó, A., Linder, T., Lugosi, G.: Fast nearest neighbor search in dissimilarity spaces. IEEE Trans. Pattern Anal. Mach. Intell. 18, 957–962 (1993)

    Article  Google Scholar 

  13. Goldberg, P.W., Jerrum, M.R.: Bounding the Vapnik–Chervonenkis dimension of concept classes parametrised by real numbers. Mach. Learn. 18, 131–148 (1995)

    MATH  Google Scholar 

  14. Hellerstein, J.M., Koutsoupias, E., Miranker, D.P., Papadimitriou, C., Samoladas, V.: On a model of indexability and its bounds for range queries. J. ACM 49(1), 35–55 (2002)

    Article  MathSciNet  Google Scholar 

  15. Indyk, P.: Nearest neighbours in high-dimensional spaces. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, pp. 877–892. Chapman and Hall/CRC, Boca Raton/London/New York/Washington, (2004)

    Google Scholar 

  16. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, Dallas, Texas, pp. 604–613 (1998)

    Chapter  Google Scholar 

  17. Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbour queries. In: Prof. 16-th Symposium on PODS, pp. 369–380 (1997)

    Google Scholar 

  18. Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30, 457–474 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ledoux, M.: The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs, vol. 89. American Mathematical Society, Providence (2001)

    MATH  Google Scholar 

  20. Milman, V.D., Schechtman, G.: Asymptotic Theory of Finite Dimensional Normed Spaces. Lecture Notes in Mathematics, vol. 1200. Springer, Berlin (1986)

    MATH  Google Scholar 

  21. Miltersen, P.B.: Cell probe complexity—a survey. In: 19th Conference on the Foundations of Software Technology and Theoretical Computer Science, FSTTCS (1999). Advances in Data Structures Workshop

    Google Scholar 

  22. Navarro, G.: Searching in metric spaces by spatial approximation. VLDB J. 11, 28–46 (2002)

    Article  Google Scholar 

  23. Navarro, G.: Analysing metric space indexes: what for? Invited paper. In: Proc. 2nd Int. Workshop on Similarity Search and Applications, SISAP 2009, Prague, Czech Republic, pp. 3–10 (2009)

    Chapter  Google Scholar 

  24. Navarro, G., Reyes, N.: Dynamic spatial approximation trees for massive data. In: Proc. 2nd Int. Workshop on Similarity Search and Applications, SISAP 2009, Prague, Czech Republic, pp. 81–88 (2009)

    Chapter  Google Scholar 

  25. Panigrahy, R., Talwar, K., Wieder, U.: A geometric approach to lower bounds for approximate near-neighbor search and partial match. In: Proc. 49th IEEE Symp. on Foundations of Computer Science, pp. 414–423 (2008)

    Google Scholar 

  26. Panigrahy, R., Talwar, K., Wieder, U.: Lower bounds on near neighbor search via metric expansion. In: Foundations of Computer Science, FOCS, pp. 805–814 (2010)

    Google Scholar 

  27. Pǎtrascu, M., Thorup, M.: Higher lower bounds for near-neighbor and further rich problems. In: Proc. 47th IEEE Symp. on Foundations of Computer Science, pp. 646–654 (2006)

    Google Scholar 

  28. Pestov, V.: On the geometry of similarity search: dimensionality curse and concentration of measure. Inf. Process. Lett. 73, 47–51 (2000)

    Article  MathSciNet  Google Scholar 

  29. Pestov, V.: An axiomatic approach to intrinsic dimension of a dataset. Neural Netw. 21, 204–213 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  30. Pestov, V.: Indexability, concentration, and VC theory. J. Discrete Algorithms (2012). doi:10.1016/j.jda.2011.10.002

    MathSciNet  Google Scholar 

  31. Pestov, V., Stojmirović, A.: Indexing schemes for similarity search: an illustrated paradigm. Fundam. Inform. 70, 367–385 (2006)

    MATH  Google Scholar 

  32. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  33. Santini, S.: Exploratory Image Databases: Content-Based Retrieval. Academic Press, New York (2001)

    Google Scholar 

  34. Shaft, U., Ramakrishnan, R.: Theory of nearest neighbors indexability. ACM Trans. Database Syst. 31, 814–838 (2006)

    Article  Google Scholar 

  35. Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40, 175–179 (1991)

    Article  MATH  Google Scholar 

  36. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  37. Vempala, S.S.: The Random Projection Method. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 65. American Mathematical Society, Providence (2004)

    MATH  Google Scholar 

  38. Vidyasagar, M.: Learning and Generalization, with Applications to Neural Networks, 2nd edn. Springer, London (2003)

    Google Scholar 

  39. Volnyansky, I., Pestov, V.: Curse of dimensionality in pivot-based indexes. In: Proc. 2nd Int. Workshop on Similarity Search and Applications, SISAP 2009, Prague, Czech Republic, pp. 39–46 (2009)

    Chapter  Google Scholar 

  40. Weber, R., Schek, H.-J., Blott, S.: A quantatitive analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24-th VLDB Conference, New York, pp. 194–205 (1998)

    Google Scholar 

  41. White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proc. 12th Conf. on Data Engineering, ICDE’96, La Jolla, CA, pp. 516–523 (1996)

    Google Scholar 

  42. Yianilos, P.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proc. 3rd Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 311–321 (1993)

    Google Scholar 

  43. Zezula, P., Amato, G., Dohnal, Y., Batko, M.: Similarity Search. The Metric Space Approach. Springer, New York (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Pestov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pestov, V. Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions. Algorithmica 66, 310–328 (2013). https://doi.org/10.1007/s00453-012-9638-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-012-9638-2

Keywords

Navigation