When Is “Nearest Neighbor” Meaningful?
- Kevin BeyerAffiliated withCS Dept., University of Wisconsin-Madison
- , Jonathan GoldsteinAffiliated withCS Dept., University of Wisconsin-Madison
- , Raghu RamakrishnanAffiliated withCS Dept., University of Wisconsin-Madison
- , Uri ShaftAffiliated withCS Dept., University of Wisconsin-Madison
We explore the effect of dimensionality on the “nearest neighbor” problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10–15 dimensions.
These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10–15) dimensionality!
- When Is “Nearest Neighbor” Meaningful?
- Book Title
- Database Theory — ICDT’99
- Book Subtitle
- 7th International Conference Jerusalem, Israel, January 10–12, 1999 Proceedings
- pp 217-235
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 4. Institute of Computer Science, The Hebrew University
- 5. Department of Computer and Information Science, University of Pennsylvania
- Author Affiliations
- 6. CS Dept., University of Wisconsin-Madison, 1210 W. Dayton St., Madison, WI 53706
To view the rest of this content please follow the download PDF link above.