# Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size

## Abstract

We consider the problem of nearest-neighbor search for a set of *n* data points in *d*-dimensional Euclidean space. We propose a simple, practical data structure, which is basically a directed acyclic graph in which each node has at most two outgoing arcs. We analyze the performance of this data structure for the setting in which the *n* data points are chosen independently from a *d*-dimensional ball under the uniform distribution. In the average case, for fixed dimension *d*, we achieve a query time of *O*(log^{2} *n*) using only *O(n)* storage space. For variable dimension, both the query time and the storage space are multiplied with a dimension-dependent factor that is at most exponential in *d*. This is an improvement over previously known time-space tradeoffs, which all have a super-exponential factor of at least *d* ^{ θ(d) } either in the query time or in the storage space. Our data structure can be stored efficiently in secondary memory: In a standard secondary-memory model, for fixed dimension *d*, we achieve average-case bounds of *O*((log^{2} *n*)*/B* + log *n*) query time and *O(N)* storage space, where *B* is the block-size parameter and *N = n/B*. Our data structure is not limited to Euclidean space; its definition generalizes to all possible choices of query objects, data objects, and distance functions.

## Preview

Unable to display preview. Download preview PDF.

### References

- 1.P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J. E. Goodman, and R. Pollack, editors,
*Advances in Discrete and Computational Geometry*, volume 223 of*Contemporary Mathematics*, pages 1–56.American Mathematical Society, 1999.Google Scholar - 2.H. Alt and L. Heinrich-Litan. Exact
*L∞*nearest neighbor search in high dimensions. In*Proceedings of the 17th ACM Symposium on Computational Geometry*, pages 157–163, 2001.Google Scholar - 3.S. Arya and D. M. Mount. Algorithms for fast vector quantization. In
*Proceedings of the 1993 IEEE Data Compression Conference*, pages 381–390, 1993.Google Scholar - 4.S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions.
*Journalof the ACM*, 45:891–923, 1998.MATHCrossRefMathSciNetGoogle Scholar - 5.K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “Nearest Neighbor” meaningful. In
*Proceedings of the 7th International Conference on Database Theory*, pages 217–235, 1999.Google Scholar - 6.T. M. Chan. Closest-point problems simpli.ed on the RAM. In
*Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms*, 2002.Google Scholar - 7.K. L. Clarkson. A randomized algorithm for closest-point queries.
*SIAM Journal on Computing*, 17(4):830–847, 1988.MATHCrossRefMathSciNetGoogle Scholar - 8.S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute, 1999.Google Scholar
- 9.R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley, 2000.Google Scholar
- 10.R. A. Dwyer. The expected number of k-faces of a Voronoi diagram.
*Computers and Mathematics with Applications*, 26(5):13–19, 1993.MATHCrossRefMathSciNetGoogle Scholar - 11.C. Faloutsos and K.-I. Lin.
*FastMap*: a fast algorithm for indexing, data-mining, and visualization of traditional and multimedia databases. In*Proceedings of the 1995 ACM-SIGMOD International Conference on Management of Data*, pages 163–173, 1995.Google Scholar - 12.J. Goldstein and R. Ramakrishnan. Contrast plots and p-sphere trees: Space vs. time in nearest neighbor searches. In
*Proceedings of the 26th International Conferenceon Very Large Data Bases*, pages 429–440, 2000.Google Scholar - 13.S. Har-Peled. A replacement for voronoi diagrams of near linear size. In
*Proceedings of the 42th IEEE Symposium on the Foundations of Computer Science*, pages 94–103, 2001.Google Scholar - 14.P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In
*Proceedings of the 30th ACM Symposium on the Theory of Computing*, pages 604–613, 1998.Google Scholar - 15.F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas. Fast nearest neighbor search in medical image databases. In
*The International Journal on Very Large Data Bases*, pages 215–226, 1996.Google Scholar - 16.E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In
*Proceedings of the 30th ACM Symposium on the Theory of Computing*, pages 614–623, 1998.Google Scholar - 17.S. Meiser. Point location in arrangements of hyperplanes.
*Information and Computation*, 106(2):286–303, 1993.MATHCrossRefMathSciNetGoogle Scholar - 18.R. Motwani and P. Raghavan.
*Randomized Algorithms*. Cambridge University, 1995.Google Scholar - 19.K. Mulmuley.
*Computational Geometry: An introduction through randomized algorithms*. Prentice Hall, 1994.Google Scholar - 20.B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In
*Proceedings of the 16th International Conference on Data Engineering*, pages 589–598, 2000.Google Scholar - 21.A. Pentland, R. W. Picard, and S. Sclaro.. Photobook: tools for content-based manipulation of image databases.
*International Journal of Computer Vision*, 18(3):233–254, 1996.CrossRefGoogle Scholar - 22.H. Samet.
*Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS*. Addison-Wesley, 1990.Google Scholar - 23.J. Vleugels and R. C. Veltkamp. Efficient image retrieval through vantage objects. In
*Proceedings of the 3rd International Conference on Visual Information Systems*, pages 575–584, 1999.Google Scholar