Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size

  • Michiel Hagedoorn
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2572)

Abstract

We consider the problem of nearest-neighbor search for a set of n data points in d-dimensional Euclidean space. We propose a simple, practical data structure, which is basically a directed acyclic graph in which each node has at most two outgoing arcs. We analyze the performance of this data structure for the setting in which the n data points are chosen independently from a d-dimensional ball under the uniform distribution. In the average case, for fixed dimension d, we achieve a query time of O(log2 n) using only O(n) storage space. For variable dimension, both the query time and the storage space are multiplied with a dimension-dependent factor that is at most exponential in d. This is an improvement over previously known time-space tradeoffs, which all have a super-exponential factor of at least d θ(d) either in the query time or in the storage space. Our data structure can be stored efficiently in secondary memory: In a standard secondary-memory model, for fixed dimension d, we achieve average-case bounds of O((log2 n)/B + log n) query time and O(N) storage space, where B is the block-size parameter and N = n/B. Our data structure is not limited to Euclidean space; its definition generalizes to all possible choices of query objects, data objects, and distance functions.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J. E. Goodman, and R. Pollack, editors, Advances in Discrete and Computational Geometry, volume 223 of Contemporary Mathematics, pages 1–56.American Mathematical Society, 1999.Google Scholar
  2. 2.
    H. Alt and L. Heinrich-Litan. Exact L∞ nearest neighbor search in high dimensions. In Proceedings of the 17th ACM Symposium on Computational Geometry, pages 157–163, 2001.Google Scholar
  3. 3.
    S. Arya and D. M. Mount. Algorithms for fast vector quantization. In Proceedings of the 1993 IEEE Data Compression Conference, pages 381–390, 1993.Google Scholar
  4. 4.
    S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journalof the ACM, 45:891–923, 1998.MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “Nearest Neighbor” meaningful. In Proceedings of the 7th International Conference on Database Theory, pages 217–235, 1999.Google Scholar
  6. 6.
    T. M. Chan. Closest-point problems simpli.ed on the RAM. In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms, 2002.Google Scholar
  7. 7.
    K. L. Clarkson. A randomized algorithm for closest-point queries. SIAM Journal on Computing, 17(4):830–847, 1988.MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute, 1999.Google Scholar
  9. 9.
    R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley, 2000.Google Scholar
  10. 10.
    R. A. Dwyer. The expected number of k-faces of a Voronoi diagram. Computers and Mathematics with Applications, 26(5):13–19, 1993.MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    C. Faloutsos and K.-I. Lin. FastMap: a fast algorithm for indexing, data-mining, and visualization of traditional and multimedia databases. In Proceedings of the 1995 ACM-SIGMOD International Conference on Management of Data, pages 163–173, 1995.Google Scholar
  12. 12.
    J. Goldstein and R. Ramakrishnan. Contrast plots and p-sphere trees: Space vs. time in nearest neighbor searches. In Proceedings of the 26th International Conferenceon Very Large Data Bases, pages 429–440, 2000.Google Scholar
  13. 13.
    S. Har-Peled. A replacement for voronoi diagrams of near linear size. In Proceedings of the 42th IEEE Symposium on the Foundations of Computer Science, pages 94–103, 2001.Google Scholar
  14. 14.
    P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing, pages 604–613, 1998.Google Scholar
  15. 15.
    F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas. Fast nearest neighbor search in medical image databases. In The International Journal on Very Large Data Bases, pages 215–226, 1996.Google Scholar
  16. 16.
    E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th ACM Symposium on the Theory of Computing, pages 614–623, 1998.Google Scholar
  17. 17.
    S. Meiser. Point location in arrangements of hyperplanes. Information and Computation, 106(2):286–303, 1993.MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University, 1995.Google Scholar
  19. 19.
    K. Mulmuley. Computational Geometry: An introduction through randomized algorithms. Prentice Hall, 1994.Google Scholar
  20. 20.
    B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the 16th International Conference on Data Engineering, pages 589–598, 2000.Google Scholar
  21. 21.
    A. Pentland, R. W. Picard, and S. Sclaro.. Photobook: tools for content-based manipulation of image databases. International Journal of Computer Vision, 18(3):233–254, 1996.CrossRefGoogle Scholar
  22. 22.
    H. Samet. Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Addison-Wesley, 1990.Google Scholar
  23. 23.
    J. Vleugels and R. C. Veltkamp. Efficient image retrieval through vantage objects. In Proceedings of the 3rd International Conference on Visual Information Systems, pages 575–584, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Michiel Hagedoorn
    • 1
  1. 1.Max-Planck-Institut für InformatikSaarbrückenGermany

Personalised recommendations