Automata, Languages and Programming

Volume 3142 of the series Lecture Notes in Computer Science pp 858-869

The Black-Box Complexity of Nearest Neighbor Search

  • Robert KrauthgamerAffiliated withIBM Almaden Research Center
  • , James R. LeeAffiliated withComputer Science Division, U.C. Berkeley

* Final gross prices may vary according to local VAT.

Get Access


We define a natural notion of efficiency for approximate nearest-neighbor (ANN) search in general n-point metric spaces, namely the existence of a randomized algorithm which answers (1+ε)-approximate nearest neighbor queries in polylog(n) time using only polynomial space. We then study which families of metric spaces admit efficient ANN schemes in the black-box model, where only oracle access to the distance function is given, and any query consistent with the triangle inequality may be asked.

For \(\varepsilon < \frac{2}{5}\), we offer a complete answer to this problem. Using the notion of metric dimension defined in [GKL03] (à la [Ass83]), we show that a metric space X admits an efficient (1+ε)-ANN scheme for any \(\varepsilon < \frac{2}{5}\) if and only if \(\dim(X) = O(\log \log n)\). For coarser approximations, clearly the upper bound continues to hold, but there is a threshold at which our lower bound breaks down—this is precisely when points in the “ambient space” may begin to affect the complexity of “hard” subspaces SX. Indeed, we give examples which show that \(\dim(X)\) does not characterize the black-box complexity of ANN above the threshold.

Our scheme for ANN in low-dimensional metric spaces is the first to yield efficient algorithms without relying on any additional assumptions on the input. In previous approaches (e.g., [Cla99,KR02,KL04]), even spaces with \(\dim(X) = O(1)\) sometimes required Ω(n) query times.