Skip to main content

A Probabilistic Spell for the Curse of Dimensionality

  • Conference paper
  • First Online:
Algorithm Engineering and Experimentation (ALENEX 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2153))

Included in the following conference series:

Abstract

Range searches in metric spaces can be very difficult if the space is “high dimensional”, i.e. when the histogram of distances has a large mean and/or a small variance. This so-called “curse of dimensionality”, well known in vector spaces, is also observed in metric spaces. There are at least two reasons behind the curse of dimensionality: a large search radius and/or a high intrinsic dimension of the metric space. We present a general probabilistic framework based on stretching the triangle inequality, whose direct effect is a reduction of the effective search radius. The technique gets more effective as the dimension grows, and the basic principle can be applied to any search algorithm. In this paper we apply it to a particular class of indexing algorithms. We present an analysis which helps understand the process, as well as empirical evidence showing dramatic improvements in the search time at the cost of a very small error probability.

This work has been partially supported by CYTED VII.13 AMYRI Project (both authors), CONACyT grant R-28923A (first author) and FONDECYT Project 1-000929 (second author).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimension. In Proc. SODA’ 94, pages 573–583, 1994.

    Google Scholar 

  2. J. Bentley. Multidimensional binary search trees in database applications. IEEE Transactions on Software Engineering, 5(4):333–340, 1979.

    Article  Google Scholar 

  3. S. Berchtold, D. Keim, and H. Kriegel. The X-tree: an index structure for high-dimensional data. In Proc. VLDB’96, pages 28–39, 1996.

    Google Scholar 

  4. E. Chávez, J. Marroquýn, and G. Navarro. Fixed queries array: a fast and economical data structure for proximity searching. Multimedia Tools and Applications, 2001. To appear.

    Google Scholar 

  5. E. Chávez and G. Navarro. Measuring the dimensionality of general metric spaces. Tech.Rep. TR/DCC-00-1, Dept. of Computer Science, Univ. of Chile, 2000. Submitted, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/metricmodel.ps.gz

  6. E. Chávez and G. Navarro. A probabilistic spell for the curse of dimensionality. Tech.Rep. TR/DCC-00-2, Dept. of Computer Science, Univ. of Chile, 2000. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/probmetric.ps.gz

  7. E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquýn. Searching in metric spaces. ACM ComputingSurveys, 2001.Toappear. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/survmetric.ps.gz

  8. B. Chazelle. Computational geometry: a retrospective. In Proc. ACM STOC’94, pages 75–94, 1994.

    Google Scholar 

  9. K. Clarkson. Nearest neighbor queries in metric spaces. Discrete Computational Geometry, 22(1):63–93, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  10. A. Guttman. R-trees: a dynamic index structure for spatial searching. In Proc. ACM SIGMOD’84, pages 47–57, 1984.

    Google Scholar 

  11. D. Harman. Overview of the Third Text REtrieval Conference. In Proc. TREC-3,pages 1–19, 1995. NIST Special Publication 500-207.

    Google Scholar 

  12. L. Micó, J. Oncina, and E. Vidal. A new version of the nearest-neighbor approximating and eliminating search (aesa) with linear preprocessing-time and memory requirements. Pattern Recognition Letters, 15:9–17, 1994.

    Article  Google Scholar 

  13. D. White and R. Jain. Algorithms and strategies for similarity retrieval. Tech.Rep. VCL-96-101, Visual Comp. Lab., Univ. of California, La Jolla, CA, July 1996.

    Google Scholar 

  14. P. Yianilos. Excluded middle vantage point forests for nearest neighbor search. In DIMACS Implementation Challenge, ALENEX’99, Baltimore, MD, 1999.

    Google Scholar 

  15. Peter N. Yianilos. Locally lifting the curse of dimensionality for nearest neighbor search. Technical report, NEC Research Institute, Princeton, NJ, June 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chávez, E., Navarro, G. (2001). A Probabilistic Spell for the Curse of Dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds) Algorithm Engineering and Experimentation. ALENEX 2001. Lecture Notes in Computer Science, vol 2153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44808-X_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-44808-X_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42560-1

  • Online ISBN: 978-3-540-44808-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics