# Nearest Neighbor Queries in Metric Spaces

## Authors

DOI: 10.1007/PL00009449

- Cite this article as:
- Clarkson, K. Discrete Comput Geom (1999) 22: 63. doi:10.1007/PL00009449

## Abstract.

Given a set *S* of *n* sites (points), and a distance measure *d* , the *nearest*
*neighbor*
*searching* problem is to build a data structure so that given a query point *q* , the site nearest to *q* can be found quickly. This paper gives data structures for this problem when the sites and queries are in a metric space. One data structure, *D(S)* , uses a divide-and-conquer recursion. The other data structure, *M(S,Q)* , is somewhat like a skiplist. Both are simple and implementable. The data structures are analyzed when the metric space obeys a certain sphere-packing bound, and when the sites and query points are random and have distributions with an exchangeability property. This property implies, for example, that query point *q* is a random element of \( S\cup\{q\} \) . Under these conditions, the preprocessing and space bounds for the algorithms are close to linear in *n* . They depend also on the sphere-packing bound, and on the logarithm of the *distance*
*ratio*
\( \Upsilon(S) \) of *S* , the ratio of the distance between the farthest pair of points in *S* to the distance between the closest pair. The data structure *M(S,Q)* requires as input data an additional set *Q* , taken to be representative of the query points. The resource bounds of *M(S,Q)* have a dependence on the distance ratio of *S*
\(\cup\)
*Q* . While *M(S,Q)* can return wrong answers, its failure probability can be bounded, and is decreasing in a parameter *K* . Here *K≤ |Q|/n* is chosen when building *M(S,Q)* . The expected query time for *M(S,Q)* is *O(K*log * n)*log \(\Upsilon(S\cup Q)\) , and the resource bounds increase linearly in *K* . The data structure *D(S)* has expected *O(* log *n)*
^{
O(1)
} query time, for fixed distance ratio. The preprocessing algorithm for *M(S,Q)* can be used to solve the all nearest neighbor problem for *S* in *O(n(*log * n)*
^{
2
}
*(*log *ϒ(S))*
^{
2
}
*)* expected time.